NAS A-CR- 170119 
19830013570 


//A?A "7 


A Reproduced Copy 

OF 


a/As/P Cfc 


//^ //^ 


- <r 7 

nrfrs-Ar-t- 


j 

7 a//</ 


Reproduced for NASA 
by the 

NASA Scientific and Technical information Facility 


LIBRARY BOPY 


b:X % 


LANGLEY RESEARCH CENTER 
LIE3RARY, NASA 
HAMPTON, VIRGINIA 


FFNo 672 Aug 65 


NF02583 




fv'- 





, ,1 (NASA-CR- 170 1 IS)) AN OVERVIEW OF COMPUTE* 

; ,j VISION Final Report (National Oareau of 
j Standards) 109 p HC AOti/flF AO I CSCL OSB 




H83-2184J 

Unclas 
095', 9 








mm 02-2002 




fan 
l;aa 
t ha 




William 8. Gevarter* 


U.S. DEPARTMENT OF COMMERCE 
National Bureau of Standards 
National Engincsring Laboratory 
Center for Manufacturing Engineering 
Industrial Systems Division 
Metrology Building, Room A127 
Washington, DC 20234 

September 1532 


Prepared for: 

National Aeronautics and Spsca Administration 

Headquartors 

Washington, DC 20546 




U.S. DEPARTMENT OF COMMERCE, Malcolm Baldrlgo, Socrstary 

NATIONAL BUREAU OF STANDARDS, Ernost Amblor, Director 

•Rosoarch Associate at the National Bureau of Standards Sponsored by NASA Hsadquartora 


ORIGINAL PAGE M 
OF POOR QUALITY 


P reface 

Computer Vision ** 

Computer Vision -- visual perception employing computers — 
shares with "Expert Systems" the role of being one of the most 
popular topics in Artificial Intelligence today. Commercial 
vision systems have already begun to be used in manufacturing and 
robotic systems for inspection and guidance task3. Other systems 
at various stages of development, are beginning to be employed in 
military, cartograhic and image interpretation applications. 

This report reviews the basic approaches to such systems, 
the techniques utilized, applications, the current existing 
systems, the state-of-the-art of the technology, issues and 
research requirements, who is doing it and who is funding it, and 
finally, future trends and expectations. 

The computer vision field is multifaceted , having many 
participants with diverse viewpoints, with many papers having 
been written. However, the field is still in the early stages of 
development — organizing principles have not yet crystalized, and 
the associated technology has not yet been rationalized. Thus, 
this report is not as smooth and even as would be desirable. 
Nevertheless, this overview should prove useful to engineering 
and research managers, potential users and others who will be 
affected by this field as it unfolds. 


^This report is in support of the more general MBS/NASA report, 
An Overview of Artificial Intelligence and Robotics . 


i 


ORIGINAL PASS ,M 
0” POOR QUAUp; 

Acknowledgments 

- „ • *— . — 

I wish to thank the many individuals and organizations who 
have contributed to this report by furnishing information and 
suggestions. I particularly would like to thank Drs, T. Binford 
of Stanford, M. Brady of M. Tenenbaum and H. Barrow of 

Fairchild and Gennery et al of JPL for furnishing source material 
that was essential to the development of this report. In 
addition, I would like to thank the staff of the NBS Industrial 
Systems Division, CDMR R. Ohlander of DARPA, Drs. A. Rosenfeld 
and M. Schneer of U. of MD., M. Fischler of SRI, E. Sacerdoti of 
M.I.C. and J. Wilder of Object Recognition Systems for reviewing 
this report and suggesting corrections and modifications. 
However, any remaining errors or omissions must remain the 
responsibility of the author. I would also like to thank Margie 
Johnson for typing and facilitating the publication of this NBS 
series of reports on Artificial Intelligence and Robotics. 

It is not the intent of the National Bureau of Standards to 
recommend or endorse any of the manufacturers or organizations 
named in this report, but simply to attempt to provide an 
overview of the computer vision field. However, in such a 
diverse and rapidly changing field, important activities and 
products may not have been mentioned. Lack of such mention does 
not in any way imply that they a*e not also worthwhile. The 
author would appreciate having any such omissions or oversights 
called to his attention so that they can be considered for future 
reports . 


li 


Tnhla of Con bo jits 

Preface . , i 

I. Introduction i 

II. Definition 5 

III. Origins of Computer Vision .... 7 

IV. Relation to Human Vision 11 

V. Applications 16 

VI. Basis for a General Purpose Image Understanding 

System 19 

VII. Basic Paradigms for Computer Vision 22 

A. Hierarchical — Bottom-up Approach 22 

B. Hierarchical Top-down Approach 20 

C. Heterarchical Approach 26 

D. Blackboard Approach 26 

VIII. Levels of Representation 28 

IX. Research in Model-Based Vision Systems 30 

X. Industrial Vision Systems 38 

A. General Characteristics ..... 33 

B. Examples of Efforts in Industrial Visual 

Inspection Systems 00 

C. Examples of Efforts in Industrial Visual 

Recognition and Location Systems 02 

D. Commercially Available Industrial Vision 

Systems 40 

XI. Who is Doing It 06 

A. Research Oriented 06 

B. Commercial Vision Manufacturers 06 

XII. Who is Funding It 09 



iii 



XIII. Summary of the State-of-the-Art .......... 52 

A. Human Vision ....... 52 

B. Low and Intermediate Levels of Processing ... 52 

C. Industrial Vision Systems 54 

D. General Vision Systems 56 

1. Introduction 56 

2. Difficulties 57 

3. Techniques 59 

4. Conclusions 60 

E. Visual Tracking 62 

F. Overview ..... 62 

XIV. Current Problems and Issues 64 

A. General 64 

B. Techniques 65 

1 . Low Level Processing 65 

2. Middle Level Processing 65 

3. High Level Processing 66 

C. Representation and Modeling 66 

D. System Paradigms and Design 66 

E. Knowledge Acquisition — Teaching and 

Programming 67 

F. Sensing 67 

G. Planning 67 

XV. Research Needed 68 

A . General 68 

B. Techniques 68 

1. Low Level Processing • 68 

2. Middle Level Processing 68 

iv 


c 


69 

69 

69 


3. High Level Processing * 


Representaion and Modeling 

D. System Paradigms and Design .... 

E. Knowledge Acquisition ~ Teaching and 

Programming 70 

F. Sensing 70 

G. Planning 70 

XVI. Future Trends 71 

A. Techniques 71 

B. Hardware and Architecture . 71 

C. AI and General Vision Systems 72 

D. Modeling and Programming 73 

E. Knowledge Acquisition 73 

F. Sensing 73 

G. Industrial Vision Systems 74 

H. Future Applications 75 

I. Conclusion 77 

References 78 

Append ices 

A. Low Level Features and Representations 86 

A. Pixels 86 

B. Texture 86 

C. Regions 86 

D. Edges and Lines 87 

E. Corners 87 

B. Extracting Edges and Areas . 88 

A. Extracting Edges 88 

1. Linear Matched Filtering .... 88 




v 


2. Non-Linear Filtering 88 

3. Local Thresholding 92 

4. Surface Fitting . . 92 

5. Rotationally Insensitive Operators .... 92 

6. Line Following 92 

7. Global Methods . 92 

B. Edge Finding Variations ..... 94 

C. Linking Edge Elements and Thinning Resultant 

Lines 94 

D. Remarks on Edge Finding 95 

E. Extracting Regions 96 

Segmentation and Interpretation 98 

A. The Computer Vision Paradigm 98 

B. An Early Bottom-up Systems 98 

C. Problems with Bottom-up Systems 99 

D. Interpretation-Guided Segmentation 99 

E. Use of General World Knowledge to Guide 

Segmentation 100 

2-D Representation, Description and 

Recognition 103 

A. Pyramids 103 

B. Quadtrees ..... 103 

C. Statistical Features of a Region 103 

D. Boundary Curves 104 

E. Run-Length Encoding 104 

F. Skeleton Representations and Generalized 

Ribbons 104 

G. Representation by a Concatenation of Primitive 

Forms 105 

H. Relational Graphs 105 


vi 


(r* |s«; i c- ()— i | re | a 


I. Recognition . , . . 105 

£. Recovery of Intrinsic Image Characteristics .... 106 

A. Basic Approach 106 

B. Shape from Shading 109 

C. Stereoscopic Approach 110 

D. Photometric Stereo 110 

E. Shape from Texture Ill 

F. Shape from Contour Ill 

G. Shape and Velocity from Motion Ill 

F. Higher Levels of Representation 113 

A. Volumetric Models 113 

1. Generalized Cones 113 

2. Wire Frame Models 113 

3. Polyhedral Models 113 

4. Combining ID, 2D, and 3D Primitives .... 114 

5. Planes and Ellipsoids 114 

6. Sets of Prototype Volumes 114 

B. Symbolic Descriptions 114 

C. Procedural Models 115 

. Higher Levels of Interpretation 116 

. Tracking ..." 118 

. Additional Tables of Model-Based Vision 122 

. Tables of Commerically Available 

Systems 135 

. Glossary 146 

. Publication Sources for Further Information 154 



vii 




1. A Framework for Early and Intermediate States in a 

Theory of Visual Information Processing 12 

2. An Example of a 2-1/2D Sketch . . 13 

3. Examples of Applications of Computer Vision How 

Underway 17 

4. Model-Based Interpretation of Images 20 

5. Basic Image Understanding Paradigms 23 

6. Computational Architecture for a General-Purpose 

Vision System 30 

7. Organization of a Visual System 32 

8. Intensity Variations at Stop Edges 89 

9. A Set of Intrinsic Images Derived from a Single 

Monochrome Intensity Image 108 




Tables 


I. Examples of Nor.* Linear Filtering for Extracting Edge 

end Line Elements 91 

1 1 — 1 . Model-Dased Vision Systems -- ACRONYM 36 

II-2. Model-Based Vision Systems — VISIONS 37 

III. [Additional] Model-Based Vision Systems 123-13*1 

IV. Example Efforts in Industrial Visual Inspection 

Systems *1 1 

V. Example Efforts in Industrial Visual Recognition and 

Location Systems 4 3 — ^ 

VI. Commercially Available Industrial Vision Systems . . . 136-1*1? 

VII. University Organizations Engaged in Computer Vision 

Research *17 

VIII. Commercial Vision System Developers *<8 

IX. Visual Tracking Approaches 119-120 

X. Example Future Applications for Computer Vision 

Systems 76 


lx 



Q&gKAL PA8E S3 
OF FOu?* QUALITY 


ra 

i 

N 



i 

i 


i 


i \ 

M 



! 


■ I '• ..»•■ 


Computer Vision 


I . Introduction 

Following the lead of Cohen and Feigenbaum (1982, p. 127) we 
nay consider compu ter vision to be the inf or nation-processing 
task of u nderstanding a 3cene from its projected images . Other 
fields such as image processing and pattern recognition also 
utilize computers in vision tasks. However, we can distinguish 
the fields by categorizing them as follows: 

Image processing is a signal processing task that transforms 
an input image into a more desirable output image through 
processes such as noise reduction, contrast enhancement and 
registration . 

Pattern recognition is a classification task that classifies 
images into predetermined categories. 

Co m puter vision is an image un d erstandin g task that 
automatically builds a description not only of the image itself , 
but of the three dimensional scene that it depicts. The terra 

s cene analysis has been used in the past to emphasize the 

distinction between processing two dimensional images, as in 
pattern classi ficiation, and seeking information about three - 
dimensional scenes. 

In this report, we will emphasize the Artificial 
Intelligence (AI) aspects of vision and therefore will dwell on 
image understanding. Image understanding includes among its 
techniques, many of the methods found in image processing and 

pattern recognition. However, it also i ncludes geometric 

/ — — ' ' 





\ 


3 


1 


modeling, 


processing 


and A I knot;! 


techniques-. 


r-yu.s£ 13 

Of poof! GUALRTV 




• it. 


edge representation and cognitive 


Hiatt (1981, p. 3) -observes, "For practical purposes, 
investigators of computer vision often define seeing as gathering 
visual data for the purpose of making complex decisions. 
Computer vision is accordingly, a major adjunct to the study of 
artificial intelligence." Arden (1980, p. 982) adds, "A view 
widely held by psychologists is that perception is an active 
process in which hypotheses are formed about the nature of the 
environment and sensory information is sought that will confirm 
or refute these hypotheses. This view of perception, as a form 
of problem-solving at least at some stage, is held by many 
researchers in artificial intelligence." Thus, computer vision 
with its many current and potential applications is a m&jo- 
Artificial Intelligence (AI) topic today. The following chapters 
are an attempt to provide an overview of this important and 
growing field. In addition to reviewing the conceptual basi3 for 
computer ’is ion and its associated techniques, we will also 
review their implementation in vision systems, both research and 
commercial . 

Chapters II, III and IV further define computer vision, 
reviewing its origins and its relation to human vision. Chapter 
V briefly indicates applications of computt vision. 

Chapters VI out] ines a basis for a general purpose computed* 
vision system, in the process providing a structure for 
comprehending systems with lesser aspirations. 


2 



ORIGINAL PAGE IS 
0? POOR QUALITY 

♦ 

Chapter VII reviews the basic control structures suitable 
for vision systems. Chapter VIII examines the successive levels 
of representation found in computer vision systems. 

Vision systems, both research and industrial are covered in 
Chapters IX and X. Information on who the principal participants 
are in the computer vision field is given in Chapters XI and XII. 

The state-of-the-art, current problems and issues, research 
requirements and future trends are presented in Chapters XIII to 
XVI . 

Reviews of the representation methods and processing 
techniques used in computer vision are given in the appendices. 

Appendix reviews representations for low level image 
features such as pixels, edges, regions, etc. 

Appendix B reviews techniques (such as filtering and 
thresholding) for extracting edges and regions. 

Appendix C discusses methods for symbiotically combining 
image segmentation with interpretation . 

Appendix D provides an overview of methods (such as 
statistical features, boundary curves, primitive forms, and 
relational graphs) for succinctly representing image features and 
utilizing the resulting representation for recognition. 

Appendix reviews the various methods for extracting 
intrinsic image characteristics such as surface shapes, ranges 
and orientations from 2-D images. Also included is a d iscussion 
of extracting shape and velocity from successive images of 
objects in motion. 

Appendix _F provides an overview of nigher levels of 






3 



JR.**! IS 


-■-■ — r«vaa 63 

&* POOS? QUAUTV 


reprosentat ion-~both volumetric and procedural models, and 
symbolic descriptions such as relational graphs. 

Appendix G reviews how intrinsic images can be given higher 
level interpretations by segmenting intrinsic surface 
characteristics into objects (either by model or symbolic 
description matching) yielding object recognitions or scene 
descriptions . 

Appendix f) reviews real-tim e visual tracking , needed for 
guidance, assembly and other tasks. 

A glossary of terms in computer vision is given in Appendix 
K. Publications sources for further information are listed in 
Appendix L. 





Cfftpnt *. 


OF i 12 

^ quality 


II. Definition 


Computer (computational or machine) vision can be defined a3 


percept ion by a computer based on visual sensory input. 

Horn (1979, pp. 70-71) characterizes machine vision (from a 
robotic orientation) as follows: 

An optical system forms an image of some three- 
dimensional [3-D] arrangement of parts. The two- 
dimensional [2-D] image is sensed and converted into 
machine readable format. It is the purpose of the 
machine vision system to derive information from this 
image useful in the execution of the given task. In 
the simplest case the information sought will concern 
only the location and orientation of an isolated 
objeet--more commonly, objects have to be recognized 
and their spatial relationships determined. This can 
be viewed as a process in which a description of the 
s ce n e being viewed is developed from the raw image. 

Ttttr a e s c r l p tTo ti has to be appropriate to the particular 
application. That is, irrelevant visual features 
should be discarded, while needed relationships between 
parts of objects must be deduced from their optical 
projection . 

Barrow and Tenenbaum (1981, p. 573) enlarge on thi3 from a 
more general viewpoint-, stating: 

Vision is an information-processing task with 
well-defined input and output. The input consists of 
arrays of brightness values, representing projections 
or a t hr ee-d i mens i enal scene recorded by a camera or 
comparable imaging device. Several input arrays may 
provide information in several spectral bands (color) 
or from multiple viewpoints (stereo or time sequence). 

The desired output is a concise description of the 
three-dimensional scene depicted in the image, the 
exact nature of which depends upon the goals and 
expectations of the observer. It generally involves a 
d e s eri id.ion of objects and t heir interrelationships, 
b*T£ may also include such information as the t n r e e - 
d imcnslonal structure of surfaces , t he ir physical 
cl Tamoter ist ic s ( shape , texture, color , material), and 
the locations of shadows and light sources... 


In this report, we will follow the lead or Ballard and Brown 
(1982, p. 2) and define Computer Vision as "the enterprise of 
automating and integrating a wide r a n g e of processes and 



ORIGINAL PAGE S3 
OF POOR QUALITY 


t 


representations used for vision perception." The emphasis will 

’ ’ r ’ ' . ■ 

be on generating a description or an tiding of the scene 

from which the image was obtained. The next chapter will enlarge 
on this point of view. 


6 


',r.v- 


O'* •-.!> L-. 

ft£ /»t»i «■«; 

— • * vw£» v<v*Vl ij ; i 


III . Origins of Compute r Vision 

Computer vis ion la based largely o n ideas from three related 
fields: image processing, [sat tern recognition and scene 
analysis. 

Rosenfeld (1981, p. 596) states that, "In image processing, 
the input and the output are both images with the output an 
improved version of the input." In preprocessing we have gray- 
scale modification (usually to normalize scene brightness and 
contrast), sharpening (to restore the weakened high spatial 
frequencies) and smoothing to remove noise in the image. If two 
images have to be compared, they may have to be registered (i.e., 
geometrically transformed to make then congruent) before matching 
them. 


In pattern recognition the input is the image, but the 
output is a description of the image based on a priori knowledge 
of expected patterns. The computer usually starts with a. list of 
brightness values associated with the array of hundreds of 
thousands of points corresponding to the image. Recognizing a 
pattern means replacing this mass of undigested data with a much 
simpler more useful description. However, it is usually 
impractical to search directly for examples of the patterns we 
are interested in this array of intensity values. Instead, it is 
often more convenient to first search for examples of simpler 
patterns (such as edges and regions), referred to as features. A 


7 





WSS-P 1 ® 


mof* *** «sss 


simplified description of the image constructed from these 
features can then be used as the basis for pattern recognition®. 

Scene analysis is concerned with the transformation of 
simple features into abstract descriptions relating to objects 
that cannot be simply recognised based on pattern matching. 

Brady C 1 98 1 A, pp. 4-5) referring to 3cene analysis as image 
understanding (IU) expands on the differences between pattern 
recognition and IU, observing that typically pattern recognition 
systems are concerned with recognizing the input as one of c 
usually small set of possibilities. Pattern recognition systems 
are mostly concerned with images of basically two dimensional 
objects. When the images are of three dimensional objects, such 
as engine parts, they are effectively treated as two dimensional, 
by considering each stable position as a separate object. In 
contrast, IU has dealt extensively with three dimensional images. 

More significantly, pattern recognition systems typically 
operate directly on the image. I U approaches to m ost visual 
proc esses ( e.g., stereo, texture, shape from shading), operate 
not on the image but o n symbolic representations tha t ha ve been 
computed by earlier pro cessing such as edge d etection. 

Arden (1980, pp. 482-483), taking a historical perspective, 
contrasts the pattern-recognition ar.d the IU or AI approach as 
follows : 


Pratt (1978, pp. 568-569) indicates that in many cases for 
simple objects in uncluttered imagery, it is feasible to extract 
needed features by transformations of the images (e.g., using a 
two dimensional Fourier transform). The resulting feature space 
can be partioned into regions for classification into objects, 
based on prototypes. 


8 






■I i 



•+. *■ & a 


v<*>-»> rw 
« U'Vll Sr • ***■«* i t 


Sine: the early sixties there has been a marked 
divergence between the pattern-recognition and AI ap- 
proaches to computer analysis of images. The former 
approach has continued to stress the U3e of ad hoc 
image features in combination with statistical 
classification techniques. M ore recently, use has been 
m ade of "syntactic" methods in which images are P I? 
r e c o g n i zed cy a "parsing" proc es3*~as being built up 
h Terarchically of primitive constituents . By contrast, 
t fi"e O approach has employed problem-solving * _ 
methodologies based on extensive use of knowledge about 
the class of images, or '’scenes," to be analyzed... 


Much of the work on computer vision has dealt with 
images of scenes containing solid objects viewed from 
nearby. These are the sort of images with which a 
robot vision system must cope in using vision to guide 
its motor activities, including manipulation and 
locomotion. The analysis of such images is usually 
called " scene analysis," to distinguish it from the 
a n a 1 y s i s o f images that are essentially two - 
dimensional, such as photomicro graphs (which sFfcT w 
cross-secTTons) , radiographs (which shew projections), 
satellite imagery (in which terrain relief is 
negligible), documents, diagrams, maps, and so on. The 
methods of computer vision, however, apply equally to 
these latter classes of images; the term need not be 
re stricted to three-dimensional scene analysis ^ 

In this report we will only treat image processing and low- 


level vision to the extent needed for image understanding. 


Pattern recognition, which has broken off from AI and ha3 also 


become a separate field, will also be given minimum treatment. 


To a la rge extent, th e terms scene analysis , image 
understanding, and computer vision have become synonymous. The 
more advanced vision systems have a strpng AI flavor, being 

heavily concerned with symbolic processes for representing and 

\ 

manipulating knowledge in a problem solving mode. Though vision 
systems that primarily depend on pattern recognition techniques 
are also treated in this report, the intent i3 to concentrate on 
the knowledge-based scene analysis, (IU) approach which is the 
major focus in AI computational vision. 




9 





fs* 


PAGE B 

m-i -QUALrry..f ■ 


In the next chapter we 
that human vision ha3 to the 


will briefly look at 
AI approach to computer 


the relation 
vision. 


10 


IV. Relation to Human Vision 


MIT's Harr and Nlshihara (1978, p. 42) take the view that 
"Artificial Intelligence is (or ought to be) the study of infor- 
mation processing problems that characteristically have their 
roots in some aspect of biological information processing." They 
developed a computational theory of vision based on their study 
of human vision. Figure 1 represents the transition from the raw 
image through the primal sketch to the 2-1/2D sketch (indicated 
in Figure 2), which contains information on local surface 
orientations, boundaries, and depths. 

The primal sketch, reminiscent of an artist's hurried 
drawing, is a primitive but rich description of the way the 
intensities change over the visual field. It can be represented 
by a set of short line segments separating regions of different 
brightnesses. A list of the properties of the lines segments, 
such as location, length, and orientation for each segment can be 
used to represent the primal sketch. 

The late Dr. Marr and his associates' development of a 
hum; n visual information processing theory (Marr, 1982) has had a 
substantial impact on computational vision. 



Figure 1 


ORIGINAL PASS K 
OF POOR QUALITY 


A Framework for Early and Intermediate States in 
A Theory of Visual Information Processing 



Intensity Representations 


Visible Surface Representations 


The computations begin with representations of the intensities in 
an image — first the image itself, (e.g., the gray-level intensity 
array) and then the primal sketch, a representation of spatial 
variations in intensity. Next comes the operation of a set of 
modules, each employing certain aspects of the information 
contained in the image to derive information about local 
orientation, local depth, and the boundaries of surfaces. From 
this is constructed the so-called 2-1/2 dimensional sketch. Note 
that no "high-level" information is yet brough to bear: the 
computations proceed by utilizing only what is available in the 
image itself. 

Source: Harr and Nishihara, 1978, p. 42. 


12 














c\ j •’? A S' .i,v 


vrj i'.!;.) riO'Vt 

c?p 3es g®*s 


Figure 2 

An Example of a 2-1/2D Sketch 


i i t T-->~ 


K 




—r ^ ^ / ? 

■H ^ y / ? 

• >-•••*<: &. ? 


% %. r~ • 

r «r-»" a 

..-i % i 

" \ 'V ■S, 

% V ‘s*. 

% %. 'V 


A candidate for the 30 -cailed 2-1 /2-dimensional sketch, which 
encompasses local determinations of the depth and orientation of 
surfaces in an image, as derived from processes that operate upon 
the primal sketch or some other representation of changes in 
gray-level intensity. The lengths of the needles represent the 
degree of tilt at various points in the surface; the orientations 
of the needles represent the directions of tilt... Dotted lines 
show contours of surface discontinuity. No explicit 
representation of depth appears in this figure. 

Source: Marr and Nishihara, 1978, p. J 41. 


13 



CK36IKAL PASS (3 
0? POOS QUALITY 

V-' v 1 

Barrow and Tenenbsum (1981, pp, 579-580) also sock insights 
into the organization of a high-performance, general-purpose 
visual system from observations of the behavior of the human 
visual system. They observe that a person looking at a natural 
scene, such as the landscape, is aware of many intermediate 
levels of description, such as surfaces, volumes, and shadows. 
Over a wide range of viewpoint and illumination, a person can 
readily estimate quite accurately such local surface 
characteristics as reflectance, color, texture, distance, and 
orientation, as well as such global cha racteristics as size, and 
s h ape . Boundaries are seen not merely as intensity 
discontinuities, but as physically significant events-- 
discontinuitie3 in distance, orientation, reflectance, incident 
illumination, and so forth. Humans also experience immediate 
global perceptions: the type of scene (landscape), the dominant 
orientations of the support plane and the gravitational vertical, 
the direction of illumination, and the viewpoint with respect to 
these. Thus, what a person sees are intrinsic characteristics 
of three-dimensional surfaces, not transient features of a two- 
dimensional image as observed under a particular set of viewing 
conditions . 

They also note that perception by humans of surfaces and 
surface boundaries does not appear to depend critically upon 
contrast nor familiarity with the specific objects depicted. 

T here are str ong indications (c.f. Gevarter, 1977) that th e 
interpretative planning areas of the human brain set up a context 
for processing the input data. (This is captured by Minsky’s 
(1975) AI "frame" concept for knowledge representation). The 

\n 




'• ORiGir’AL PAGE JS 
OF PCOi? QUALITY 

brain then uses visual and othc-r cues from the environment to 
draw in past knowledge to generate an internal representation and 
interpretation of the scene. This knowledge-based expectation- 
guided approach to vision is now appearing in the advanced AI 
computer vision systems (discussed in later Chapters). 

Barrow and Tennenbaum suggest that insights gained by 
studying human vision, coupled with experience resulting from 
building machine vision systems, can provide the basis for a 
computational model of visual processing. Their approach to a 
general purpose computer vision system will be pursued in Chapter 
VI, but now we pause to motivate this pursuit by briefly 
reviewing applications of computer vision already underway in 
this rapidly growing field. 



ORIGINAL PAGE IS 
OF POOR QUALITY 


V. Applications 

Brady < 1 9 8 1 A r p. 2) states that, ’’There is currently a surge 
of interest in image understanding on the part of indU3t. y and 
the military.” Current computer vision applications, primarily 
taken from Brady ( 1 98 1 A , pp. 3-*D f are listed in Figure 3. 


16 



©SlIGffiAL PASS JS 
SF POOR QUALftY 

. .* * ^ • » » • j *7 • • 

Figure" 3f Examples of Ap plications o£ Computer Vision Now 
' ‘ Underway 


AUTOMATION OF INDUSTRIAL PROCESSES 

Object acquisition by robot arms, for example sorting 
or packing items arriving on conveyor belts. 

Automatic guidance of seam welders and cutting tools. 

VLSI-related processes, such as lead bonding, chip 
alignment and packaging. 

Monitoring, filtering, and thereby containing the 
flood of data from oil drill sites or from seismographs. 

Providing visual feedback for automatic assembly and 
repair. 

INSPECTION TASKS 

The inspection of printed ciruit boards for spurs, 
shorts, and bad connections. 

Checking the results of casting processes for 
impurities and fractures. 

Screening medical images such as chromosome 3lides, 
cancer smears, x-ray and ultrasound images, tomography. 

Routine screening of plant samples. 

Inspection of alpha-numerics on labels and m. ..uxactursd 
items . 

Checking packaging and contents in pharmaceutical and 
food industries. 

Inspection of glass items for cracks, bubbles, etc. 
REMOTE SENSING 

Cartography: the automatic generation of hill-shaded 
maps, and the registration of satellite images with 
terrain maps. 

Monitoring traffic along roads, docks, and at 
airfields . 

Management of land resources such as water, forestry, 
soil erosion, and crop growth. 

Exploration of remote or hostile regions for fossil 
fuels and mineral ore deposits. 



OHt&iNAL PAG2 FS 
OF POOR QUALITY 

Figure 3 (cent.) 

MAKING COMPUTER POKER MORE ACCESSIBLE 

Management information systems that have a 
communication channel considerably wider than current 
systems that are addressed by typing or pointing. 

Document readers (for those who still use paper). 

Design aids for architects and mechanical engineers. 

MILITARY APPLICATIONS 

Tracking moving objects. 

Automatic navigation based on passive sensing. 

Target acquisition and range finding. 

AIDS FOR THE PARTIALLY SIGHTED 

Systems that read a document and speak what they road. 
Automatic "guide dog" navigation systems. 


13 




VI. Basis for a General Purpose Ins*e Understanding 


Barrow and Tenenbaum (1981.. p. 573) observe that ingoing 
from a sce ne to on image (an array of brightness values) that the 

*>■ V 

isi'a'k’c ’encodes much in for not ion about the scene, but the 

v r+rr-“ — — — — ~ 

informatio n is confounded in the single brightness value at each 
point. In projecting onto the two-dimensional image, 
information about the three-dimensional structure of the scene is 
^ lost. I n order to decode brightness values and recove r a scene 
\ description, it is necessary to employ a priori knowledge 


embodied in 


imaging process. 


le scene domain, the illumination, and the 


Scene models can be devised to describe the three- 


dimensional world in terms of surfaces and objects. 

I llumination mo dels can be utilized to describe the primary 
light sources, their positions, spatial extents, intensities, 
colors, and so forth. 

Se nsor models describe the photometric and geometric 
properties of the sensor, which ear. be used to predict how a 
particular scene, observed from a particular viewpoint and under 
particular illumination conditions, is transformed into the two- 
dimensional array of brightness values that constitutes the 
input . 

As indicated by Figure 4, compu ter visio n is an active 
process that uses these models to interpret the sensory data. To 
accommodate the diversity of appearance found in real imagery, a 
high-performance , general-purpose system must embody a great de3l 


of knowledge in its models. 


ORIGINAL PAGE (3 
OF FOOR QUALITY 



fxgurc 

Hodol-baacw I rro c r d r e t n t :’- o n of Iraagcs 


0 psoa 




ktconi of 

WVftlO, 

nmics* 

(UACt 

FCP^TkCiS 


SCCM 

ocscftiniON 




Source : 


Barrow and Tenenbaum, 


1981 


p. 573- 






20 





Qzmm. PAGE *3 

0? fOQvt QUAU7Y 

The nest three chapters review the work in devising computer 
vision systaras. Chapter VII discusses paradigms for computer 
vision systems. Chapter VIII presents the levels of 
representation appropriate to high performance systems. Chapter 
IX reviews research efforts in building such systems. 


21 


V- • / : ' , .. . 

VII » Basio Paradigms for Com puter Vision 0 


In brood terns, on understanding syston storta with 
tho arroy of pixel amplitudes th at define the computer imago, 
and using stored models (e ither spe cific or generic) determines 
the content of a soone. Typically, various symbolic foatures 
such as lines and areas are first determined from the image. 
Those ore then compared with similar foatures associated with 
stored models to find a nctch^ when specific objects arc being 
sought. In more, gonerio cases, it i3 nocossary to determine 
various characteristics of tho scene, and using generic models 
determine from geometric chapes and other factors (3uch as 
allowable relationships between objects) the nature of the scene 
content . 

A variety of paradigms have been proposed to accomplish 
these 'casks in image understanding systems. These p aradigms are 
based on a c ommon set of broadly defined processing and 

M ■ ■■ — ‘ 

manipulating elements^ f eature ex traction, ^symbolic 

repres entation^, and se mantic inter p r c t a t ion ._ The paradigms 

differ primarily in how these elements (defined below) are 
organized and controlled, and the degree of artificial 
intelligence and knowledge employed. 

A. Hierarchical-Bottom-up Approach 

Figure 5A is a block diagram of a hierarchical paradigm of 
on image understanding system that employs a bottom-up processing 
approach. First, primitive features are extracted from the array 
of picture element intensities that constitute the observed 


^h'is chapter is primarily based on Pratt, 1978, pp. 570-579. 

22 




I • * ' 


*> •• -• j 

Basic Jffispa Uadc?a tending FCSstlisss 


,. f «) • PftClti lu 

^roba Q»»u« 


Visual 

Model 


Feature 

J 

Symbolic 


Semantic 

Extraction 


1 

Representation 



Interpretation 


Image j 1. !££££££" 

Features 

A. Hierarchical Botrcn-up Approach 


Symbols 


Description 


<j> est Description 

SymbolrP To Symbol * — ^ 

j Mapping jr 1 - 121 . . 

— -*- £ — Description 


Feature 

Symbolic 

1 

4 

[ Semantic 

Extraction j 

Representation 

i n 

| Interpretation 


Image L.^'-racuioni j | uopresc 
Inage 
Features 

% 

B. Hierarchical Top-dcKn Approach 


Inane 

Symbols 


Visual 

Models 


Description 


„ Feature j Synbolic __ fscmantic I 

^E xtraction | Representation * 1 Inte rpretation! 

Image ^ 2 ~ ■ < * I j' 

. Features I i Symbols i 

! ii i I 


i Feature Control 


Symbol Control I 


C. Heterarchical Approac 


Feature Control 


Semantic ^ 
Interpretation 


Description 


' Data ' 
Storage, 


Feature^ 

Extraction 


Symbolic'''' 

^Representation 


D. Blackboard A pproach 

Source: Pratt, 1978, pp. 570-574. 


23 







' pash i 

‘ • ■ - . ...... G? FGQtl QUA!. ST 

'** ♦ . • •* * > . *. * 

image. Examples of such features are picture element ("pixel") 
amplitudes, edge point locations and textural descriptors. 

Next this set of features is passed on to the semantic 
interpretation stage where the features arc grouped into symbolic 
representations. For example, edge points are grouped into line 
segments or closed curves, and adjacent region segments of common 
attributes are combined. The resultant symbol set of lines, 
regions, etc., in combination with n priori stored models, are 
then operated upon (i.e., semantically interpreted) to produce an 
application dependent scene description. 

Dottom-up refers to the sequential processing and control 
operation of the system starting with the input image . The key 
to success in this approach lies in a sequential reduction in 
dimensionality from stage to stage -- vital as the relative 
processing complexity is generally greater at each succeeding 
stage. The hierarchical bottom-up approach c an be developed 
successfully for domain 3 with simple scene a made up of only a. 
limited number of previously known objects. 

D. Hierarchical Top-down Approach 

This approach (usually called hypothesize and test), shown 
in Figure 5B, is goal directed, the interpretation stage being 
guided in its analysis by trial or test descriptions of a scene. 

A n example would be using template notching — matched filtering 
— to search for a specific object or structure within the scene. 

Matched filtering is normally performed at the pixel level by 
cross correlation of an object template with an observed image 
field. It is often computationally advantageous, because of the 
reduced dimensionality, to perform the interpretation at a higher 


24 



correlating imago features or symbols 


i... 

V:: 

L ■■ 

I • 


b : level in tho chain bv 

f. 

I • rnt.hf'f' thnn 




C. Heterarchical Approach 

Hierarchical image understanding systems are- normally 
designed for specific applications. They thus tend to lack 
adaptability. A large amount of processing is also usually 
required. Pratt (1978, pp. 572-573) observes that often much of 
this processing i3 wasted in the generation of features and 
symbols not required for the analysis of s particular scene. A 
technique to avoid this problem i3 to establish a central monitor 
to observe the overall performance of the image understanding 
system and then issue commands to the various system elements to 
modify their operation to maximize system performance and 
efficiency. 

Figure 5C is a block diagram of an image under standi ng 
system that achieves heterarchical operation by distributed 
feedback control. If the semantic interpretation stage in the 
model experiences difficulty in working with its input symbol 
set, control can be fed back to the symbolic representation stage 
to request a new set of symbols. This action in turn may result 
in a command to the feature extraction stage requesting a 
modified set of features. When required, direct feedback control 
is also possible between the semantic interpreter and feature 
extractor. This paradigm provides an important auxiliary benefit 
in addition to flexibility. That is, the dimensionality of the 
feature and symbol sets can be kept at minimum levels because the 
sets can be restructured on command. 

D. Blackboard Approach 

Another image understanding system configuration called the 
blackboard model has been proposed by Reddy and Newell (1975). 

26 






Figure 5D is a simplified representation of this approach in 
which the various system elements communicate with each other vie 
a common working data 3fcorago called the blackboard. Whenever 
any element performs a task it3 output is put into the common 
data storage, which i3 independently accessible by all other 
elements. The individual elements can be directed by a central 
control, nr they can be designed to act autonomously to further 
the common system goal ns required. The blackboard system is 
particularly attractive in eases where several hypotheses must be 
considered simultaneously and their components need to be kept 
track of at various levels of representation . 


27 


VIII. Levels of Bopr »g.tfnfrafciotf 

A computer vision system, Xik« hurarn vision, is commonly 
considered to be naturally structured as a succession of levels 
of representation. Tenenbaum et al. (1$79, pp. 242-243) suggests 
the following levels (listed from low to high): 

Images 

Pictorirsl features 
Intrinsic surfaces and bodies 
3-D surfaces and bodies 
Space map 

Symbolic relationships 

Tenenbaum et al. contrast this with current industrial 
vision systems relying heavily on detailed models of particular 
objects to accomplish tasks, employing lovels of: 

Images 

Pictorial features (Edges & Regions) 

2-D feature attributes 
Objects (specific 2-D views) 

Current i ndustrial systems usually begin by thresholding the 
original gray-level imago to obtain a bi nary arra y. Pictorial 
features (regions or edges) are then extracted from the gray- 
level or binary image and equated with surfaces or surface 
boundaries. These 2-D attributes of these pseudo-surface 
features are then symbolically matched against 2-D models 
(representing specific views of expected objects) to achieve 
recognition. A s thes e industrial systems rely on prototype 2-D 
representations of anticipated objects, they are very limited for 
use in more general environments. 

Barrow and Tenenbaum (1981, pp. 580-581) suggest the levels 
given in Figure 6 as those appropriate to a general-purpose 
vision system. The processing steps in the figure that transform 
each level of representation to the next require knowledge from 


28 



.tV'ii.' 


the illumination 


p*rrv 

t ■ v • 

t • • 

i 

models' pf : the physios of the imaging process, 
and the’ scene. At the lower levels, these models help resolve 
the ambiguity associated with going from a three dimensional 
world to a two dimensional image. At the higher levels, these 
models provide a foundation for organizing surface fragments into 
recognizable objects. 


» £. 
H r-V 








Applicable 


* «*» 


Models 



Figure 6: Cuaputmticrnsl Architecture for e Oenerel-Purpoee Vieicti Systea 

Source: Terrov «nd Tenenbeua, 1931, p. 58. 

30 













. ; .The: Input models required to do. the processing at each level 
* v . r.A' v -; * ' ' r 

are .shown at .the right. On the .left arc shewn the tasks for 
which vision can be used at each level of processing 

Tencnbaum, et al., (1979 f pp. 25'l-255), sketch in Figure 7 
another way in which to view an organization of a vision system. 
They divide the figure into two parts. The first i3 image 
oriented (iconic), domain independent, and based on the image 
data (data driven). The second part of the figure is symbolic, 
dependent on the domain and the particular goal of the vision 
piocess. 

The first portion takes the image, which consists of an 
array of intensity o? picture elemonts ("pixels," e . g . , 
1000x1000), and converts it into image features such as edges and 
regions. These are then converted into a set of parallel 
"intrinsic images", one each for distance (range), surface 
orientation, reflectance”, etc. 

The second part of the system segments these into volumes 
and surfaces dependent on our knowledge of the domain and the 
goal of the computation. Again using domain knowledge and the 
constraints associated with the relations among objects in this 
domain, objects are identified and the scene analyzed consistent 
with the system goal. 

Yp 

raction of normal incident illumination reflected. 


31 


Figure 7 

Orpcninstiou o Z a VJ.ni'nl S7ctca 


fi, 

Gi- FC3R QUAUTV 


Lou Level 
Iconic 


, Sc gpo,r . 

V 



Source: 


Tcnenoaua et al., 1979, p. 255 







Reviews of the reproaeiristion methods and techniques for 
performing the operations indicated in Figure 7 are given in the 
appendices. 

The next chapter (Chapter IX) provides nn overview of 
research in model-based vision cystoma. These systems endeavor 
to start with an image and produce, using a priori models, a 
desired description of the original 3cenc, thereby spanning the 
complete hierarchy of Figure 7 . The systems are constructed 
using the various representations, techniques and models reviewed 
in the appendices. 


33 



IX. Research .in Model-ba s e d Via io n Systems 

Host research efforts in Vision have been directed at 
exploring various aspects of vision, or toward generating 
particular processing modules for a step in the vision process 
rather than in devising general purpose vision systems. However, 
there are curr ently two major U.3. ef forts in general purpose 
vision systems; ^ The lACHON^Hsystoia at Stanford University under 
the leadership of T. Binford, and the V TSTPHS, system at the 
University of Massachusetts at Amherst under A. Hanson and E. 
Ri3eman. 

The ACRONYM system, outlined in Table II— 1 , is designed to 
be a general purpose, model-based 3ysten that does its major 
reasoning at the level of volumes rather than images. The system 
basically takes a hierarchical top-down approach as in Figure 5c. 
ACRONYM has four essential parts: modeling, prediction, 

description and interpretation. The user provides ACRONYM with 
models of objects (modeled in ter:n3 of volume primitives called 
generalized cones) and their spatial relationships; a3 well as 
generic models and their subclass relationships. These are both 
stored in graph form. The program automatically predicts whicn 
image features to expect. Description is a bottom up-process 
that generates a model-independent description of the image. 
Interpretation relates this description to the prediction to 
produce a three-dimensional understanding of the scene. 

The VISIONS system outline in Table II-2, can be considered 
to be a working tool to test various image understanding modules 
and approaches. Rather than using specific models, its high 
level knowledge is in the form of framelike "schemas” which 

3*» 




f-V. 



represent expectations and expected relationships in particular 

scene situations. VISIONS is based on monocular images and does 

■ ’/ „ 

its reasoning at the level of images rather than volumes. -v-. 

Other research efforts in model-based vision systems are 
summarized in Tables III in Appendix JC, 

It will be observed that each system is individually crafted 
by the developer to reflect the developer's background, interests 
and domain requirements. All, except ACRONYM (and to an extent 
MOSAIC), use image (2-D) models and are viewpoint dependent. 
Models are mostly described by semantic networks, though feature 
vectors are also utilized. The systems capitalize on their 
choice to limit their observations to only a few objects, by 
using predominantly a top-down interpretation of images, relying 
heavily on prediction. 


35 




; - -u. 



Table !I-1 
Kod»1-2ased Vision Kystcur 


Developer: Brocks et al . (1979), Brooks (1981) 

Systta: Ar.pr,-im 

Purpose: funeral Purpose Vision Systea 

Esssple Derains: Identifying Airplanes en a Runway In Aerial logos 

Slalatlon fer Robot Systcas and for fa tocjted Grasping of Objects 


Approach 


Bedel Ing 


Icag* Feature 

Fstnrctlon tRrjrresentstlen 


Search S Batching 


fte 


• I flic; tp fee a 

I g-icsrxl vision 

system. 

loscnsl tlue to 

j alcsvjplnt. 

h fj?l Is to KSt» 

I sst 0 1 lot.'l 
infer, -.sUco ?*r 
Interpretsilcp. 

Fe; tore ectrjellcs 
(t.g., fSrrfJra 
lines end rryisns) 
still esak. 

| latirprirtatlos Is 
Unites t* SCOTS* 
«rUEi f» cSJrcts. 

So&stastfal 
progress has 
tecs achErvri In 
past fesi pars. 


to 

cn 


Hlearchtaltop down approach. 

Reasons between different le/els of representation 
based on a MerarcJy’of representations. 

High level sedeler prorides a high lerel language 
to oanlpulate Eodels oslng systolic neces. 

Predictor ar.d Planner Hodule is a role-based systea 
to generate an Observability Graph fresj the 
Chject Graph ( 3-0 object representation consisting 
of redes and relational arcs). 

Hikes predictions (which are viewpoint Insensitive) 
In the foru of symbolic constraint expressions 
with variables. 

Hales a projective transformation frea ncdels. 

Predicts appearances of Rsdds In laages in tens* 
of ribbons and ellipses. 

Incorporates translation and rotation Into observ- 
able representations. 

Searches for Instances of nodels in lieges. It 
employs geometric reasoning In the furs of a rule 
based prcfclea-solvfng system. 

It Interprets (satches) in 3-D by enforcing 
constraints of the 3-C colei. 


Represents object classes 
frea which subclasses and 
specific cbjccts are rears 
tented by mwsrlc 
constraints 

KodeH 3-0 objects using 
valutas prlattlvss: general- 
ized const find ribbons. 

Spatial relations of volsr-e 
elements within an object 
defined hierarchically. 


Car. cpdsl both specific and 
generic voices elee.rrits 
ar.d relations bc-ttcsen than 

Rodels in psrt/irhol e graph' 

Volnus primitives have lees 
rather tr.sn viewer -centered 
prlnltlrcs. 


Ribbons and curves obtained 
frea an edgs rasper. 

Surfaces obtained fro» s 
stereo at peer . 

Ssdes of tbc Plctars Graph 
(sjesbolfc version of icege 
correspond to ribbons, 
surfaces trd corves. 

Arcs tnd relatisrs Indi- 
cate spatial relations 
bstuceii codes. 


Batcher does an Interpreta- 
tion isstcbirsg by copping 
tfca Observability Graph 
Into the Picture Graph. 

hatcher ttiris In a coarse 
to fire order. 

CerSffiis Iccsl notches of 
rltbens Into clutters. 

Searches for Ksnto? sub- 
graph ratches In tbs 
Knervtbllliy Graph. 

Per fern.-; s ajer Intcrpreha- 
tlon -at tii" lerri n> 
taiuiK rather than at 
tbs level of {cages. 



Developer: Henson & Rlseman (1978b, c) 


Systems: 


VISIONS 


Table II -2 

f*c<5cl -Based Vision Systems 


Purpose: Interpreting static rxjnocular scenes 

Can be considered to be a working tool to test various Image understanding modules and approaches 

Example Domains: House scenes from ground level 

Road scenes free ground level 

Approach Modeling Image Feature Search t Pitching 

Extraction h Representation 


Uses a hierarchical nodular approach to representa- 
tion and control . 

Tries to be as general as possible to allow both 
bottom-up and top-down solution hypotheses as 
well as various Intermediate combinations 

Incorporates the flexibility to utilize various 
feature extraction r.odules and multiple knowledge 
source: as required 

Allows for the possibility of generating and 
verifying hypotheses along many paths 


Hierarchical structure 

Scene schwas (like frtces) 
are the highest represents 
tlun 

Hierarchy Is: 

-schemas 

-objects ! 

-rol unes | 

-surfaces | 

Proposed representations 
of 30 surfaces and volumes 
Include: 

-generalized cylinders 
-surface patches with 
cubic B-spHnes 
to represent 
boundary and 
blending functions 

Employs scsantlc networks 

-nodes represent 
primitive entitles 
(objects, concepts 
situations, etc.) 
-labeled arcs re- 
present rel at lord- 
ships between 
them 


Uses both edge finding or.d 
region growing to segment 
the io.-ge Into a layered 
directed graph of regions, 
11ns sesuents and vertlcc. 


Uses a hierarchical pro- 
cessing cons tprrsald) to 
be able to her.dls I-^ge 
data at various levels 
of resolution 

Uses a relaxation approach 
to organize edgss Into 
boundaries, and pixel 
clusters Into regions, 
using high-level systcu 
guidance (interpretation 
guided se^ientatlon) 


Sonera tw and ctcrc* 
partial models In ”c; yr,- 
texts* (of ths CCiH-iVLi? 
programing language) 
which provide a history 
of decisions to be used 
»sh;n backtracking is 
necessary 

Uses a rrultlple tnc-s’cdge 
source heterarchical 
approfeh khlcii generates 
partial models In the 
search space of ncdels. 
Attempts, using tep-duun 
and bstts=-up relaxation 
tcchnlgurs, to converge oi 
a Kost probable solution. 

Uses rules for focussing 
on an element of a task, 
expanding that denent 
by generating new 
hypotheses sod ve rifying 
new hypotheses. 


[ maj did reasons^ 
j Kell In svib.lng 3 
[ crude sejrcetslie.'s 
i cf a tease sce-sa 

I Viewpoint Cep~ltni 

S ci~ssi i'i£-5 isTtnit 


ORIGINAL PAGE fS 
OF FOOR QUALITY 


X 


• Industrial, Vision Systems 
A . General Character 1 s t 1 c » 

The prominent aspect of i ndustrial vi3ion systems . in 
distinction to more general vision systems, is that they operate 
i n a relatively known and structured environment. In addition, 
the situation (such as pla cement of cameras and lighting ) can be 
configured to simplify the computer vis ion problem . Usually, the 
number ar.d nature of possible objects will tend to be restricted, 
and the vi sual system will be tailored to the function performed . 
Thus m any of them are bas ed on a pattern recognition , r ather than 
an image understan ding approac h. Industrial vision systems arc 
characteristically used for such activities 33 inspection, 
manipulation and assembly. 

In an inspection task, the fccus is on deviations from a 
standard , and usually l ittle or no information i3 needed for 
identification. A manipulator controller, designed to pick parts 
off a conveyor, needs to be able to determine the identity, 
orientation and position of parts, but needs to know little of 
their precise shape except, perhaps at the grasp point. A visual 
controller for an arc welder will have its focus on the seam 
properties and needs little information about the appearance of 
the parts. 

Kruger and Thompson (1981, p. 1525), in discussing the 
design of industrial vision systems, state: 

The complexity of most perceptual tasks requires 
that the problem be decomposed into manageable 
subunits. Thus major design decisions include the 
function of each module, the computational techniques 
and data representations imbedded in each module, and 
the control structures that relate modules and transfer 


38 


information between theta. Host computer vision systems 

use a hierarchical organisation... 

A popular organisation for industrial computer vision is a 
two-stage hierarchy with a bottom-up control flow. The lower 
level segments the image into regions correspond ing to object 
surfaces. The higher level uses thi3 segmentation to identify 
objects from their surface descriptions. 

In practice, most successful systems incorporate aspects of 
both bottom-up and top-down control. The bottora-up processing is 
used to extract prominent features of a part to determine its 
position. Then, top-down control i3 used to direct a search to 
determine if the part satisfies an inspection criterion. 

Industrial inspection and assembly operations are well 
suited to model-based analysis, because of the well-defined 
geometric descriptions associated with manufactured items. 
CAD/CAM technology allows the specification of objects using 
either volumetric or surface-based models. These geometrically 
based models are particularly appropriate to the hypothesis- 
verify approach, in which low-level image features are extracted 
and matched to an appropriate computer generated 2-D 
representation . 

In addition to geometric models, objects may also be 
represented by graphs. In this case, recognition becomes a 
graph-matching process. 

More commonly at present, rather than using geometric models 
or graphs, industrial vision systems are taught by being 
presented sample parts to be recognized in each of their expected 
stable states. Aspects of the resulting images are typically 






39 



stored as templates, end recognition becomes template matching. 
The objects can also be represented in ter m3 of their 
characteristic features, such aa area, number of holes, etc., and 
the resulting feature vector stored to be matched (via a search 
process) to the corresponding extracted feature vector of the 
image during 3y3tem operation. 

To simplify industrial vision systems, the in put is usuall y 

reduced to a binary (black and white) image, so that objects 

' * “ “ * ' 

appear as silhouettes. Simplicity i3 important in industrial 
vision systems because the computation time is limited, a3 most 
systems are expected to operate in near real time. 

B. Examples of Efforts in Industrial Visual Inspection Systems 

Table IV (based largely on Kruger and Thompson, 1981) lists 
some example efforts of vision systems designed for inspection. 
The systems listed are primarily for the inspection of printed 
circuit boards and IC chips, with template matching being the 
pr edominant inspection approach . 

Kruger and Thompson (1981, p. 1^29) note that: "Automated 

visual analysis has also been applied to the inspection ol 
surface properties such as roughness, scratching and other 
potential defects. The be3t successes have come with highly 
specialized illumination and sensing systems, specifically 
tailored for a particular application. Recently, greater 
sophistication in the modeling of the imaging process ha3 lead to 
prototype surface inspection systems with the promise of 
increased generality." 

Chin (1982) has recently published an extensive bibliography 
on automated visual inspection techniques and applications. 

JJO 


Tk‘j!c iv 





Example Effects In Industrial Visual Inspection Systems 


Developer 

Purpose KsdsMng end 

Sample Domains Approach Repreccntatio 


Baird (1073) 

Inspection 

Automated manufacture 
of power transistor-pair 
IC chips 

GH 

Inspection process consists of 

1) Detection of the IC location and orientation on tha heat-sink 
sufcstrcte 

2) Quality control assessment after acquisition 

A gradient edge detector Is used to compile the histogram of all edge 
directions In the Inspection field. Peak of this histogram Indicates 
the approximate orientation of the chip 

Next the corners of the IC are located by template Hatching. If any 
corners rot located, the IC rejected 

Cracked, fractured chips are eliminated by a simple contrast thres- 
hbldlng operation 

Inspection field 
consists of a SI 
pixel region 
digitized to 16 
gray levels 

Templates for IC 
corners 

Chin, Harlow t. Dwyer (1977) 

Inspection 

PCB's 

U. of Maryland 

Tralnlnn Phase 

Use an operator-interactive model-building graph procedure to train 
the inspactlon systea 

Using an Interactive car.cra/dlsplay system, the binary Image edges of 
of a prototype PC3 board are detected, smoothed to reduce noise, end 
encoded Into a compact data structure. 

Inspection Phase 

Matching (against prestored edges end the graph model) is used to detect 
flaws In tsst images 

Edges, Graph Modi 

Xrakauer & Pavlldls (1979) 
Inspection 
Mass-Produced PCB's 

Princeton University 

Ingenious use Of binary template matching using a limited number of 
well-chosen templates acctssed via a rapid lookup technique 

Binary Templates 

Jarvis (1980) 
Inspection 
Mass-Produced PCB's 

Bell Labs 

1) Local pattern matching to stored binary templates 

2) Supplemental tests for susp'clous reqlons 

-Computation of conductor area 

-Length of the conductor-substrate boundary 

-ratio of area to length 

Processing dons with simple hvlcal operations 

List of 5x5 
pixel binary 
templates 

Hselli & Fu (1979, 1930) 
Inspection and wlrebondtng 
guidance 

Multi-layered IC chips 
Purdue University 

Inspection paradigm for proposed system Is (for the most nart)top-down 
and model -driven using a tree-like syntatlc approach 

farlous Inspection algorithms ere called for based on the actions of a 
controller, which monitors the whole vision process 

First, the Image goes thru a series of task end context-dependent filters 
to reduce ambiguities 

Then, 8 special purpose defect detectors are used, as required 

Design and 
Inspection 
specification • 
the form of a 
descriptive da- 
base 

Six subpattern 
masks 








C. E xamples of Efforts In Industrial Visual Recognition and 
‘ "Loeacicn S ystHjms " " 

Table V (again largely derived from Kruger and Thompson, 
1981) Hots some example efforts of vision syetcras designed for 
industrial part recognition and location. All these systems use a 
bottom-up approach. It will be observed that (except for Vamos 
1979, and Albus et al., 1902) these systems utilize template or 
feature vector matching. Vamos does work from a 3D wire frame 
model which utilizes computer graphics type techniques to 
transform a model projection into alignment with observed lines 
in the image. 

Albus* Machine Vision Group in the HB3 Industrial Systems 
Division is using simplified 3D surface models of machined parts 
to generate expectancy images from needed viewpoints. The group 
is seeking to achieve real-time, hierarchical, multi-sensory, 
interactive robot guidance. 

D. Commercially Available Industrial Vi3ion Systems 3 

Table VI in Appendix £ list3 many of the Industrial Vision 
Systems that are currently commercially available. Mo3t of the 
systems require special lighting. 

It will be observed that many of the systems designed for 
verification and inspection use pattern recognition, rather than 
AI techniques. The systems tend to be bottom-up because of the 
speed requirements to achieve real time operation. Often unique 
edge and feature extraction algorithms are programmed in hardware 
or firmware. 

41 Add itional "in format ion can be found in Gevarter (1982A). 


42 




Table V 

Example Efforts in Industrial Visual Recognition and Location Systems 


Developer 

Purpose 

Sample Domains 


Agin ( 1 980) * SRI 
SRI Vision Module 

Locate, identify and guide 
manipulation of industrial 
parts 

Engine Parts 


Kashioka et al (1976). Hitachi 
Central Research Lab 

Location and Bending Guidance 

Transistor wire-bonding . 


■ •af' 


Approach 


Bottom-up approach 

Uses thresholding to convert to a binary image 

Each line is sequentially scanned and edge points (where pixels change 
form 1 to 0 or 0 to 1 recorded). Each resulting segment on a line 
is matched to the previous line to determine their overlapping 
relationships. Using these relationships, the program traces the j 
appearance and disappearance of blobs (regions) as the image is 
processed from top to bottom. 

Using blob descriptors, the system can recognize parts regardless of 
their position or orientation. The descriptors are matched using 
either a binary decision tree or a normalized nearest-neighbor 
method . 

The system is trained by repeatedly showing the object to the TV camera 
resulting in all potentially useful shape descriptions beir auto- 
matically calculated and stored 


Template matching 

Locates appropriate base and emitter leads on a semi-conductor chip 
so that wires can be stretched and bonded between them 

Initially trained by man-machine interactive selection in the universe 
of templates 

Multiplexed computer architecture to accommodate separate cameras on up 
to 50 bonding machines on a time-shared basis 


Modeling and 
Representation 


Blob descriptors 
include: 

-max. and rain, 
x and y values 

-Holes 

-Area 

-Moments of inertia 

-Periaeter length 

-Linker? list of 
coordinates on 
the perimeter 


Local 12x12 pixel 
binary templates 


Tfb'e V (c5r.tfr.v94) 

Csesp 1# tvi'oft? tr. 3r£-- a str1*1 VJiuel Kc-cogfiltles ind location Syttws 


ORl2’?iJAL' P£-SS fs 
07 POOH jUAUTV 


Dtvelopsr 

Purpose 

SaspU D^iilaj 

Holland Rossol 4 Krri (1579> 
Consight I 

Industrial part location, 
recognition » -;d «r.f palettcn 

fngine parts 


MBS: Albas ct el. (1532) 

Visual servoing for robot 
guidance (real-time 
location and identification 
for manipulation) 

Machined parts 

National bureau of Standards 


Modeling and 

Appoatli Representation 

Tko linftr light sources tupcrfapste a lira of light on e conveyor belt Feature vector of part 
perpsnjlcular to It! direction of motion. The two 1lr.es terorets, 1 m;o chartcterlstlcc 
propcrtiorjl to th* pert poising by. Faint of separation determines 
part tcuooary; degree of tcparetinn determines part thickness. 

Ttia i:cne Is imaged rifh a linear array etnere and a silhouette 
evtCMtioelty generated. 

Uses tr-’f tvsturo victor approach es SRI Module. 


Ej.pl oy. t point light source, a sheets-of-structured-l Iglit generator I 
and > c nears, all counted on tbs wrist of a robot ant. 

Uses alternatj fr.aass of: 

1. A regular point soured lllisiir.atlon of tbs entire object, end 

2. Two parallel planas of structured light. 

System deterolr.es location end orientation based on trlanoulatlon 
(associated with relative hdlght of intertsctlon o? light shtats with 
part), and recognition based cn snip? end site of observed lircthat 
the planes of light petes as tt intmects part. Uses this Information 
to Interpret ootlloa Kin in tesgj produced by the point source 
(Huai nation. 

Analysis of vision Input is psrforstd with e hiertrchically ergenired 
group of Microprocessors. At each level of the hierarchy, end analytic 
process Is guidad by an cxpacUncy-cenerottr.; eudal Inj process. The 
codeling preens is in turn driven by a store cf » priori knowledge, by 
knowledge of thi robot’s revrents, and by fredbetk frea the er.slytlc 
process, Each such livel of »h* hierarchy providas output to guide a 
corresponding level of the rotot’s hierarchical control system. 


Uses quadratic 
approximations to 
surfaces of ideslfaec 
3-D objects. 


Perkins (1S78) 

Industrial parts recognition 
Engine components 


fachida and TsuJI (1973) 
Industrial Parts Recognition 
Asnoccluded parts of a 
small gasoline engine 

Osaka Univ. 


Vamos (1979) 

Pecognition of 30 Objects 
Bearing housings 
Assembly 

Sheet metal parts to be 
painted 

Neural nets in microscopic- 
section in neural research 


Operates on 32 gray level* 

Bottom-up setna segmentation approach 

1. Reduce 256x2’s pixel image to an "edge gradient" Image 

2. link edges with similar gradient magnitudes to form chains 

3. Characterise chains as cithar straight lints or circular arcs. 

(This reduces 65,003 pixel image to about 50 consumes.) 

System Batches observed corcurves with model genarited conoumas using: 

1. A preset control structure to select the order in which 

combinations of mcdil and scene eoneurvti cm to be matched. 

2. Starts by retching one model and one scene concurvs 

3. The Stored model is spatially transformed ond rotated to fit 

iitoclated scene concurvts 

System interactively trained by *g aerating concurvts of sample parts 
Can Identify parts partially occluded by other parts 

Uses a boundary detection and Isolation of parts In a binary image 
approach similar to SRI Vision module 9 

,,n,C, '' red •"•V«1s *«h 

Us« a serial of special featura detectors 
ool# detector 
•lint finder 
-texture detector 
•small hole detector 

tttlt&lff ta”;f V ” lnttr,ctt>> examination of the 

"n"?; ^ , i‘,“;;rV e ,;ut« f,rt ,,rs,on ° f th * 

Unas ert then fitted to edges 

Hire-frame model transformed (and bidden line elimination used) to 
corresoood to taage - yielding recognition and pirt orlentetlon 

°ni^^r^ , J" t r' Ct l'" 1r Uu,, ' t 10 ty ' tm mi,er *7 tu Idlng a 
?£3lK or r * CMWt "'-' lllKl transroraaiion cf viewed 


Concurve codels of 
sampla ports 


Stable orientation 
models of parts 

-part race 
-orientation 
-list of primitive 
features 

-polar coordinate 
boundary 

representation 


3S Hire Frame «odilt 


Hurgarlan Acad, of Science 


44 


The raoro sophisticated systems tone to utilise variations 
and improvements on the 381 Vision Module described in Table V. 

A few systems nako good use of structured light for 3D 
sensing. A number of efforts in guidance of aro welding take 

this form. 


45 



XI. Who is Dolnp It 

Rosenfeld, at the University of Korylasd, issues e yearly 
bibliography, arranged by aubjeot natter, related to the oonputer 
processing of piotorial information. tho issue Govorins 1981, 
(Kosonfold, 1982) includes nearly 1000 references. 

The following is a list by category of tho U.S. "principal 
players" in computer vision. 

A • Research Oriented 
1 • Univorsifcion 

These are shown in Table VII. 

2. Hon-Prof its 

SRI International, AI Cantor 
JPL 

3. U.S. Government 

NBS, Industrial Systems Div., Gnithorsburg , HD 
NOSC (Hovel Ocean Systems Cantor), San Diogo. 

NIH (Notional Institutes of Health) 

B. Commercial Vision Systems Developers 

A partial listing is given in toblo VIII. It has been 

reported that hundreds of companies are now involved in 

vision systems. 


46 




Table VII 


University Organisations E ngage d in Computer Vis i on Research 
Artificial Intelligence and Computer Science Laboratories Funded Under DARPA IU Program 


CMU 

U of MO X 

MIT 

U of Mass. 

Stanford U 

U of Rochester X 

use 

U of Rhode Island 
Other Active Universities 


U of Texas X 

at Austin 

VPI X 

Purdue X 

U of PA X 

U of II X 

Wayne State U X 

OHU 
RPI 


A. I. Labs Other 

Robotics Institute 


X 

X 


Comp, fi Info. Sci. Dept. 


Information Processing 
Institute 

Robotics Res. Lab 


E. E. Dept. 

Elec. S Sys. Ei.gr. Dept. 


47 





Table VIII 

Cc.mercial Vision System Developers 



CO 


I ndustrial Vision Companies Large Diversified Manufacturers* 


Machine Intelligence Corp. 
Robot Vision Systems 
Videometrix 

Object Recognition Systems 
Octek, Inc, 

Cognc-x 

Spectron Engineering Inc. 
Ham Industries 
Quantcmat 

Image Recognition Systems 
Colorado Video 
Everett Charles 
Inspection Technology 
View Engineering 
Vanzefcti 

Automated Vision Systems 
Perceptron, Inc. 

Vi com Systems, Inc. 
Cyberanimation, Inc. 

Rati con 


General Electric 
Chrysler Corporation 
General Motors 
Industrial Business Machines 
Texas Instruments 
International Harvester 
Westinghouse 
Hughes 

Lockheed - Palo Alto Research Lab. 
Fairchild Camera and Instrument Corp. 
Martin Marietta 

McDonald Douglas Automation C&mpany 
Cheesebrough Ponds 


♦Some Systems are for in-house use only 


P 



t~Z T - r ’ ~~rr-~rr~ t t--vi 1 


Robot Manufacturers 

Ccpperweld Fcobotics 
Uniciation 
Autcoatix, Inc. 



XII. Who la Fundin g It 

Tc data, the principal source of funding for computer vision 
research haa boon the U.S. G overnment 8 which is estimated to 
spend in the order of MO nil Her, s year in this area. 

The major U.S. Government program has been the DARPA Imago 
Un derstanding Progr am. Other government agencies funding vision 
research are: 

NSF (National Science Foundation) 

NIH (National Institutes of Hoalth) 

NBS (National Bureau of Standards) 

ONR (Office of Havel Research) 

DMA, (Defense Mapping Agenoy) 

"NASA (National Aeronautics and Space Administration) 

USGS (U.S, Geological Survey) 

AFOSR (Air Force Office of Scientific Research). 

It is estimated that DARPA ( Defence Advanced Research 

Projects Agency) spends in the order of $2.5 million do llars a 

year in c omputer visi on research. DARPA thrusts include 

au tomatic stereo and terrain mappi ng, autonomous navigation , 

robot vision , symbolic representation, a utonomous expert image 

systems, and photo analysis aids. DARPA helps support a number 

of Image Understanding laboratories at universities where I.U. 

work at all levels is performed. 

DMA has entered into a very active program in image and 

scene analysis. Th eir goal i3 to achieve "fully automated 11 

production for mapping, charting and geodesy by 1995, in which 

the primary role of human being3 will be to validate the inputs 

and the output extracted information. They intend to commence by 

furbishing computer vision .ids to the cartographer and achieve 

the desired high-level automation via an evolutionary route. 

Their current approach is to focus a portion of DARPA's image 


'19 



understanding effort on producing mi Image Understanding Testbed 
for integrating and evaluating current and emerging computer 
viaion techniques and systems. An initial version of this 
Testbed ha3 boon constructed at SRI in Menlo Perk, CA (Henson and 
Fischlor, 1982). The future emphasis of the Testbed will be on 
export systems that facilitate the application of IU research 
results to cartographic problems. 

NSF spends roughly $1.5 million a year on a variety of 
research topics ir. computer vision. 

HIH apends a 3ub3tanti.nl sum in obtaining and evaluating 
images for a variety of medical research applications. This has 
included efforts in semi-automatic cancer screening, eenputor- 
assisted photometry, tomography, imago formation and imaging 
equipment and various other medical application related areas. 
As the focus is application oriented, rather than computer vision 
oriented, it is difficult to pinpoint the portion that can be 
considered computer vision research. However, a rough guess 
might put the figure at o ne to two million dollar s a year. 

It has been estimated that N ASA spending on image processing 

an d cval nation approaches one hundred million dollars a year. To 

help support this effort, N ASA fund3 somewhat less than on e 

m illion dollars a year on research in computer vi3ion g . NASA 

spends roughly half this sum to support research at JPL in vision 
e — — — - — 

systems to guide robot manipulation. 


^Additional funds have been spent on image processing and 
analysis hardware such as the Massively Parallel Processor (MPP) 
at the NASA Goddard Space Flight Center, 






50 



ThG N ational Bureau of Standards has an ingoing in-house 
robotics vision research effort* uhich currently io in tho order 

of one half million dollars n year . 

Collectively, other government agencies probably spend 
another one to two million dollars per year for roneoroh in thin 
area. It is estinatod that perhaps on additional o ne to two 
mi llion dollars a year io spent by government contractors using 
rniMI^pendent Research and Development) funds associated with 

their prime contracts. 


51 



XIII. Summary of the 

A. Human Vision 

Hunan vision is the only available example of a general 
purpose vision system. However* thus Tor not many AT researchers 
have taken an interest in the computations performed by natural 
visual systems, but this situation is changing. 

The MIT vision group (among othe rs) believes th at, to a 
first approximation, t he human visual system i3 subdivided into 
mod ules specializing in visual tasks. Th ere is also evidenc e 
t hat people do global processing first and use it to conatrai n 
local pro cessing. 

C onsiderable information now exists about lower level visual 

processing in humans. However, as we progre33 up the human 
<*’■ 

visual computing hierarchy, the exact nature of the appropriate 
representations becomes subject to dispute. Thus, overall human 
vis ual perception is a still very far from being understoo d . 

B. Low and Intermediate Level3 of P rocessing 

Though methods for powerful high-level understanding visual 
analysis are still in the process of being determined, insights 
into low-level vision are emerging. Alan Mackworth, from the 
University of British Columbia observed at IJCAI-81 that there is 
a n exciting convergence in the theory of low level vision from 
t he major vision centers, such as HIT, CMU, SRI and Stanford. 
The basic physics of imaging, and the nature of constraints in 
vision and their use in computation is fairly well understood. 
Detailed programs for vision modules, such a3 "shape from 
shading" and "optical flow," have begun to appear. Also, the 
representational issues are now better understood. 

52 




However, even for wall understood low-level operations 3uch 
as edge detection, there has been no convergence among the many 
techniques proposed, and no method stands out as the best. In 
general, edge detectors are still unreliable, though Harr and 
Hilbert’s approach, based on the zero crossing of the second 
derivative of the intensity gradient, appears promising. Brady 
( 1 9 0 1 B , p. 3) states that operators designed to extract the 
"important" intensity changes in an image are 3till more an art 
than a science. Approaches to edge detection consist mostly of 
convolving images with local operators tuned to particular 
applications. These operators fare badly outside their limitejr 
domain or in the presence of noise. 

Barrow and Tenenbaum (1901, p. 576) note that the direct 
approach to image segmentation is inherently unreliable. A 
number of research groups successfully circumvented this problem 
by integrating segmentation and interpretation. However, this 
approach is not suitable for a general purpose vision system as 
it is based on advance knowledge of the objects to be expected. 

In industrial vision, the primary technique for achieving 
robust, edge finding and segmentation is to U3e special lighting 
and convert to a silhouette binary image in which edges and 
regions are readily distinguishable. 

At intermediate levels, edge classification and labelling 
have been very successfully used in the blocks world. Barrow and 
Tenenbaum (1981, p. 573) believe- that the various techniques 


developed for dealing 


blocks world could be integrated 


into a complete, highly competent vision system for that 





domain. . Thus far, hoir&ver, no cucii -system has actually been 
built. 

Binford (1902) in reviewing existing research in model-based 
vision systems observed that most systems first segment region:: 
then describe their shape. Hone of the systems makes effective 
use of texture for segmentation and description. In general, 
3hape description i3 primitive and interpretation systems have 
not yet made full use of even these limited capabilities. 

As yet, the extractio n o f useful information from color i3 
extremely rudimentary . The perceptual use of motion (optical 
flow) has been a focu3 of attention recently, but findings are 
preliminary . 

For low level processing, cany recent algorithms take the 
form of parallel computations involving local interactions. One 
popular approach having this character is "relaxation. " These 
locally parallel architectures are well suited to rapid parallel 
processing using special purpose VLSI chips. 

C. Industrial Vision Systems 

Barrow and Tenenbaum (1981, p. 572) observe that: 

Significant progress ha3 been made in recent years 
on practical applications of machine vision. Systems 
have been developed that achieve useful levels o f 
performance on complex real imagery in tasks 3uch as 
inspection of industrial parts, inter pretat ion of 
aerial imagery, and analysis of chest X-rays. Virtual- 
ly all such systems are special purpose, being heavily 
dependent on domain- specific constraints and 
techniques. For example, industrial vision systems 
usually require high contrast to obtain binary images 
and U3e overhead cameras to minimize variations in 
object appearance. 

A much more pressimistic view is taken by Kruger and 
Thompson (1981, p. 1524) who state that: 


54 




Despite substantial research efforts, the study. of 
computer vision is still in its infancy'... 
Significant reductions in complexity are possible if 
automated perception is limited to an industrial 
environment. Even here, however, kg still lack a clear 
understanding of' the fundamental problems that must be 
addressed if computer vision is to have a major impact 
on manufacturing . 

Hiatt (1981, pp. 2, 3) observes that in industry, robot 
vision systems are limited to simple repetitive processes, and 
that the classic bin of jumb led parts problem still over vfheltas 
industrial vision systems. However, Birk and Kelley at the 
University of Rhode Island have devised algorithms to 
successfully pick out parts from a bin on up to 90S of the 
computer-vision robot's machine cycles. 

Krueger and Thompson (1981, p. 1537) observe that, "The 
current state of the art precludes the construction of one 
general-purpose computer vision system with applicability to all 
industrial vision tasks.... Current systems use no common 
primitives for formal representations of object properties. 
There is also no common programming language for these 
applications. [Current industrial vision systems are limited in 
their flexibility in allowing users to reprogram the system to 
new situations.] Thi3 situation will likely improve as computer 
vision becomes more integrated into the production process." 

In adapting concepts generated in the research laboratory to 
industrial vision applications, many important additional factors 
come into play such as speed, cost and complexity. It has also 
been found that the lighting and optics play a key role in the 
robustness of an industrial system. Most potential industrial 
vision applications cannot be reduced to working with binary 


55 


silhouettes, due to texture mid other real-life environmental 
factors. Thus, syoteaa engineering is an important ingredient. 
Unfortunately, at present many prospective users have inadequate 
inhouse capability to do the systems planning and integration 
needed to successfully adept computer vision to their operations. 
This has inhibited the industrial use of sophisticated vision 
systems. The vision manufacturers are now beginning to try to 
remedy this situation by starting to provide easier user 
programming, friendlier user interfaces, and systems engineering 
support to prospective users. 

It has been estimated that as of mid-1982, though less than 
50 sophisticated industrial vision systems were actually in use, 
approximately 1000 simple line-scan inspection systems wore in 
regular operation. Though special purpose systems have thus far 
been the most effective, successful vision applications are nou 
becoming commonplace and are expanding. Many firms are nou 
entering the industrial vision fields with technical leap- 
frogging being common due to rapidly changing technology. 

D. General Purpose Vision Systems 
1 • Introduction 

Though many practical image recognition systems have been 
developed, Hiatt (1581, pp. 2, 8) observes that, "In current 
vision applications, the type of scene to be processed and acted 
upon is usually carefully defined and limited to the capability 
of the machine... General purpose computer vision has not yet 
been solved in practice." This domain specificity makes each new 
application expensive and time consuming to develop. Thus, there 
is a clear need for computer vision systems capable of dealing 


56 


with a variety of industrial applications, particularly those 
with less structured environments. 

Barrow and Tcnenbaum (1981, p. 572) note that "Developing 
general-purpose computer vision systems has proved surprisingly 
difficult and complex. This has been particularly frustrating 
for vision researchers, who daily experience the apparent ease 
and spontaneity of human perception. Research in the last few 
years, however, has provided new i nsights into tho computational 
nature of vision that could lead to systems capa ble of high 
perform ance in a broad range of visual domains." 

Brady C 1 98 1 A) observes that there has been a research 3hift 
toward topics corresponding to identifiable modules in the human 
vision systems, and away from particular domains of application. 
The consequence has been a sharp decline in the construction of 
entire vision systems. 

2. Difficulties 


Barrow and Tenenbaum (1981, p. 574) emphasize that 

Model-based interpretation of image data i3 an 
enormously complex computational task. The variety of 
possible scene configurations and viewpoints is so 
great that an exhaustive search through the space of 
possible interpretations is out of the question. Only 
the most promising or most important alternative 
interpretations can be pursued. Selection of candidate 
interpretations depends both upon information derived 
from the input image, and upon the observer's goals and 
expectations. A delicate balance must be struck 
between data-directed and goal-directed search to avoid 
oversight (not seeing things that are really present) 
and hallucination (seeing things that are not). 

Gennery et al. (1981, pp. 10-1, 10-3, 3-6) observe that 

The statement "Vision is hard" is found often in 
the computer vision literature. There are several 
reasons for the difficulty. In the first place, an 
image contains an enormous amount of information, much 


57 


of it irrelevant to the task at hand, and it is an 
•: imperfect projection of the real world, containing 

I noise and distortion. From this the relevant 

information must be extracted. In the second place, 

> the transformation from the image to the- real world i3 

'■ highly ambiguous. Thus world knowledge must be relied 

?’ on to resolve the ambiguities. (Thi3 i3 especially 

^ true in monocular vision of three-dimensional scenes, 

■; but it is also true to a lo3scr extent in stereo 

L- vision.) In the third place, an object 3een may only 

V vaguely resemble others of its generic type or even 

:j. itself at other times or under other conditions. In 

\ the fourth place, in a powerful vision system an 

object mu3t be recognized out of a large number of 
possible objects or generic types. 

i These facts appear to manifest themselves in two 

r - ways in practice. First, vision requires ar, enormous 

amount of computing. Second, it seems that the 
'' computational methods needed are very complicated, and 

it is unknown today what the right methods uill be... 

K 

£ Some experimental systems hold promise for 

recognition of generic three-dimensional objects, 
although they require a large amount of computing time 
on existing computers. Some special-purpose hardware 
is becoming available, which enables some very low- 
level computations to be performed rapidly. Even in 
these cases, hoi/ever, a variety of techniques are in 
use, with no consensus about which are the best. This 
becomes even truer as we move to the higher-level, more 
general, on more advanced areas. Furthermore, many of 
the approaches that have been used are ad hoc, with 
little promise of generality. 

...two task3 that are beyond the capability of any 
e xi3ting / computer vision system are the recdgnltlon fif ~ 
p Trts in a jumble In a bin and the operation of a 
ro bot vehicle in a complicated outdoor environment/ 

Rosenfeld (1981, p. 3) observes that "Image processing and 

see n e analysis nave definitely saturated the capacity of 

c omputers . 18 

l In relation to earth observation imagery for resources 

management, Alan Mackworth of UBC stated at IJCAI-81 that it will 
b e necessary to alter the popular current multi-spectral paradigm 
t hat pixel meaning can be determined by intensity alone — It 
doesn 1 t work. It is necessary to understand spatial 





58 


organization, meaning, and context. 


The soatial constraints are 


very important . Ther e in no chance of getting a general purpose 
vision system to understand satellite imagery alone -- it is 
necessary to use a system-generated "sketch mop" to interact with 
the scene. 

3. Techniques 

Brady (1981B, p. 4) observes that, "Host AI workers have... 
abandoned the idea that visual perception can profitably be 
studied in the context of a priori commitment to a particular 
program or machine architecture." Binford 0982) believes 
"...that building a vision system is 12 a system effort of the 
sort which are familiar in computer science, and 992 basic 
science.” 

The research emphasis has moved to developing techniques 
(vision modules) for extracting intrinsic images (shape from 
shading, shape from texture, etc.). Brady ( 1 9 8 1 A , p. 6) observes 
that, "Representations have been developed that make explicit, 
the information computed by a module... [This] leads to a view of 
visual perception as the process of constructing instances of a 
sequence of representations." 

Gennery et al (1981, p. 3-1) note that at higher levels of 
descriptions it becomes difficult to judge what are the best 
approaches. As a result, a wide variety of techniques have been 
used . 

Brady (1981, p. 99) observes that though it appears that the 
most difficult visual problem is the perception or planning of 


59 



movements through cluttered space, e solid start has been made on 
this problem by Lozano-Perez (1901). 

4. Conclusions 

Binford (1982) in reviewing current model-based research 
visio.i systems concludes that most systems have not attempted to 
be general vision systems t though ACRONYM does demonstrate some 
progress toward this goal. Existing vision systems performance s 
are strongly limited by the pe rforma nce of their segmentation 

t ' " ' 

mod ules, their weak use of world knowledge and weak description s , 
making little use of 3hape. The systems primarily relate image 
relation s to image observables; in ge neral lacking the ability to 
relate three dimensional space models to images. E xisting 
systems show little emphasis on basic vision problems in systems 
building. 

Binford observes that until recently, systems efforts have 
been small and short-lived, generally only a few man years 
effort. Focussed and continuous efforts ore necessary but not 
sufficient for system building. The system programming effort- 
alone in building a vision system is enormous. 

With the exception of ACRONYM (and to an extent MOSAIC), the 
systems surveyed depend on image mcdel3 and relations, and 
therefore are strongly viewpoint-dependent. To generalize to 
viewpoint inser.sative interpretations, would require three- 
dimensional modeling and interpretation as in ACRONYM. 

Binford found that the systems jump to conclusions based on 
flimsy evidence which would probably not distinguish many objects 
in a complex visual environment. The systems typically use the 
hypothesis-verification paradigm. Hypothesis generation is the 



crucial part, made easy in the top-down case. The systems 
succeed best with qua3i-2d scenes, for example aerial 
photographs, industrial scones from a fixed viewpoint, x-ray 
images, nnd ground level photos from a fixed viewpoint. Even 
ACRONYM, which incorporates viewpoint-insensitive mechanisms, has 
been demonstrated only on aerial images, although it appears 
applicable to ground level photographs as well. 

Binford concludes that though that the results of these and 
other efforts are encouraging as first demonstrations, 
nevertheless as general vision systems, they have a long way to 
go. 


Tenenbaura and Barrow (1S01, p. 59*0 in discussing the 
general computer vision problem conclude that: 

We are beginning to understand the computational 
nature of vision at a fundamental level, independent of 
implementation. This understanding provides new 
insights into limitations of early scene analysis 
systems and a solid scientific foundation upon which 
future general-purpose high-performance computer vision 
systems can be built... 

The competence of a vision system ultimately 
rests upon the representations it uses to describe the 
world and the models available fur manipulating and 
transforming descriptions. Many levels of description 
are necessary to achieve human performance requiring 
models of scene domains, objects, surfaces, 
illumination, sensors, and the geometry and photometry 
of imaging. 

A vision system is naturally structured as a 
sequence of levels of representation. The initial 
levels are primarily iconic (edges, regions, gradients) 
because that is the nature of the information available 
directly from an image. The highest levels are 
primarily symbolic (surfaces, objects, scenes), because 
that is the nature of the information that is sought. 
Intermediate levels are constrained by the information 
available from preceding levels and that required by 
subsequent levels. In oarticular, physical and three- 



dimensional surface characteristics provide a critical 
transition from iconic to symbolic representations. 

Early levels of processing in a vision system are 
primarily data driver., while h i g h q r levels are 
controlled by goals and expectations. At intermediate 
levels, some combination of data-driven (bottora-up) and 
goal-driven (top-down) operation is needed both to 
compensate for errors, and to avoid computational 
overload. Although the detailed nature of processing 
is dependent on representation and therefore 
considerably different at low and high levels, it is 
significant that at virtually all levels processing 
appears to be inherently parallel, and thus amenable to 
implementation by networks of computational elements 
(e.g., neurons or VLSI chips)... 

While no such [general-purpose computer vision] 
system yet exists, most of the pieces have been 
experimentally demonstrated. Thus it would not be 
unreasonable fco attempt to construct one within the 
current state of the art. Of course, many details 
still remain unresolved, especially at the higher 
levels of processing. 

E . Visual Tracking 

Real-time tracking of objects is important to manipulation 
and guidance. The state-of-the-art in visual tracking is 
reviewed and Appendix H. Though some success has been achieved 
under limited conditions, it remains as an important area for 


research . . 

F. Overview 


In conclusion, we might observe that computer vision can be 
viewed as a set of very difficult problems. However commercial 
vision systems are available and are operating successfully in 
specialized environments on low level problems of verification, 


02 




f 



inspection* ecosuroaent, recognition, nr.?} Cetera inot ion of 
objeot location or orientation* 

There is now a much better unberstaniVins °* the computer 


vision problem than there was jnst a feu years ago. A major 
focus of the current research effort is in attracting 3D shape 


from intrinsic imago characteristics. 

Though quite a numbor of high level research vision systems 
have been explored, no general vision systes is available today 
or is imminent. Major current efforts in this area aro ACRONYM 
at Stanford U. and VISIONS at the U. of Mass. 




XIV. Current Problems mrd Issues 
A. General 

Some of the general issues are: 

• Can general vision be reduced to computer analysis? 

- What assumptions about the world are restrictive 
enough? 

- How much data is required? 


Need to incorporate generic aspects of perception 
(Dinford, 1982) 

- Similarity, not spatial congruence is the paradigm 
of interpretation ir. nature. 

» Humans arc always seeing things they haven't seen 
before. 

3D interpretation of images versus 2D 

- Binford (1982) believes that general vision 
systems depend on building three dimensional 
descriptions — that prediction, description and 
interpretation take place largely in three 
dimensions. 

How necessary is it to follow the central paradigm 
(Figure 6, in this report) to achieve high level 
vision? Is it essential to employ a key intermediate 
representation such as the 2-1/2D sketch or Intrinsic 
Images? It is possible to obtain these using only 
local constraints? 

Is the hierarchical vision paradigm (Figure 6), which 
implies complete segmentation and labeling 
inappropriate for natural scenes? Is a more isomorphic 
representation needed, such a3 a map which implicitly 
captures the detail and relations and is more 
appropriate for natural computations? For such 
isomorphic representations, is the serial digital 
computer inappropriate and another calculating medium 
such as a network needed? (See Fischler, 1978, 1981.) 

Methods and hardware to reduce the software generation 
costs and processing time for computer vision**. 


WUdd ( 1 9 8 0 )~ provides a good overvie v/ of computer processing 
requirements for computer vision and appropriate architectures 
and hardware to implement them. 


64 



* Lack of interface standards for connecting computer 

vision systems to robots and industrial machines. 

* Active vs passive sensing in vision systems. 

* Relative merits of binary versus grey-scale imagery. 

* Host issues are still poorly understood. 

Techniques 

1 . Low Level Processing 

* Many of the unsolved problems in computer vision 
are at this level. 

* T he wh ole issue of constructin g the primal sketch 
f roiiP zero crossings of the second derivative o f 
t he intensity is far from resolved 7 

* Direct edge finding and region segmentation are 
still unreliable for general vision. 


* A key insight is that local information is usually 
inadequate to guide segmentation and 
interpretation in a general scene. Global 
structure such as the shading gradient is 
required. To what extent can modeling and using 
physical contraints in scene analysis provide 
global restrictions which can guide segmentation 
and assist in classification? Possible examples 
include 1) utilization of shadows to locate 
lighting sources and to pinpoint objects casting 
shadows, and 2) use of sky-land boundaries as a 
global constraint. Another global approach for 
man-made scenes is to employ the camera model and 
geometric perspective to detect vanishing points 
associated with parallel lines in an urban scene. 
Fishier et al. (1982) indicate that the detection 
of clusters of parallel lines by finding their 
vanishing points can be used to automatically 
screen large amounts of man-made structures. 

* How to best utilize and avoid difficulties with 
texture in natural scenes is still unsolved. 

* Rectification of images prior to stereo matching 
remains a problem. 

2 • Middle Level Processing 

* A key problem remaining in computer vision is 
bridging the gap between pictorial features (e.g., 
edges and regions) and 3D objects. 


65 



• Techniques for analysing time-varying imagery. 

• Limits of the intrinsic image approach: It is not 

clear that ho can reliably obtain intrinsic images 
from images of real scenes via. the methods 
outlined in this report, Alternative approaches 
when available, such as stereo or active Tannins 
sen sors., a* ay be pr-* ferablc for extracting 
intrinsic" characteristics. 

How best to deal .with shadows and occlusions. 

3. Higher Level Processin g 

' Relation of higher level vision to AI 

• Modules that operate on the surface orientation 
map to produce object representations* 

• Generic interpretation in terms of object classes. 

• Semantic interpretation. 

• Semantic search techniques for U3e in matching 
schemes using semantic segmentations and indexing. 

• Identification for interpretation of which 
geometric parameters arc casually (functionally) 
rather than statistically determined (Binford, 
1982). 

C. Representation and Modeling 

‘ representations for complex and amorphous shapes (e.g., 
a tree, a crumpled sweater, a flowing stream). 

* Proper level for the dividing line between iconic 
representations at the lower levels and symbolic at the 
higher levels, and how much these representations 
should overlap. 

*. How to index efficiently into a database containing a 
large number of models. 

* What sort of features should be extracted from the 
scene (edges, corners, regions, surface oriontion, 
etc.) and how should objects be modeled (wire-frame 
models, generalized cylinders, etc.) 

D. System Paradigms and Design 

* Is the relaxation process the most attractive approach 
at the lower levels where global aspects are not 
directly considered? 


66 



* How far can parallel methods (like relaxation) be 

pushed at all i ovals? 

Is o combination of top-down and bctton-up the 
preferred approach for complex vision tasks? 

* Under what circumstances is the blackboard approach to 
be preferred? i.'ote that hierarchical image 
understanding systems suffer from lack of adaptability 
and also require a large amount of processing. 

E. Knowledge Acquisit ion - Teaching and Programming 

* Methods for knowledge acquisition at all levels. 

* Methods for learning and tracking of generic types. 

» How to make the system versatile by having it 

programmable at a very high level. 

Design of a very-high-level programming language 
especially for vision. Little has been done to date in 
this area. 

F. Sensing 

* Active vs passive sensing. 



G. 


Best methods for using structured light. 

Methods for acquiring 3D directly. 

Is scanning laser radar the wave of the future? 


Methods to incorporate planning into robotic systems 
utilizing vision. 





? 




f 


XV. Research Heed ed 

A. General 

' Maori to understand human vision as it*s our boat 

example of a general purpose vision system." 

<■ ... . 

• Need research In general purpose systems capable of 
high performance in a wide variety of visual domains. 

• Heed to be able to use generic recognition. 

• Methods to reduce software costs of computer vision and 
reduce processing time. 

' Image processing techniques for greater capability. 

• Interface standards. 

' Methods for visual guidance in cluttered spaces. 

• Improved understanding of the extent and use of domain 
specific information in visual perception. 

• Heed to determine ho w to best utilise range 

.in formation"! " : " 

r* — 

• Need to develop global methods (e.g. f utilizing 
shading) either to bypass or to help guide the current 
hierarchical paradigm. 

B . Techniques 

1 . Low level processing 

* More reliable and faster edge and region finders 
for general scenes. 

* Ways to extract motion measures from sequences 
of intensity arrays." 


• Reliable stereo disparity modules. 

• Determination of surface properties, such as 
color, smoothness, coatings, etc. 

2 . Middle level processing 

‘ Techniques for analyzing time-varying imagery. 

Methods to bridge the gap between edges and 
regions and 3D object.3. 

• Improved methods of extracting intrinsic images. 


68 





3. High lovol processing 


* Better understanding of how to use toxture. 

♦ Module's' 'that operate- on the surface orientation 
nap to produce object representations. 

• Methods for generic and semantic interpretation. 

C. Representation and Modeling 

• Representations for complex and amorphous shapes. 

• Techniques for indexing into a large data base of 
mod els. 

• Mathematical methods to model texture conviently. 

• More precise representations of surface orientation 
maps at different levels of resolution. 

• Methods to group properties at each level of resolution 
for each representation , so that a hierarchical 
structure can be imposed upon the representations. 

• Determination of under what conditions is binary 
imagery most favorable and under what conditions i3 
gray-scale to be preferred. 

• Choosing which features to extract from a scene. 

• Methods for m odeling 3D objects. 

D. System Paradigms and Design 

• Need to explore the extension of relaxation processes 
to multiple levels in the pyramid of description and 
interpretation . 

Methods of local parallel processing which can discover 
global information through propogation. 

’ Efficient methods and techniques for maintaining 

concurrently a number of images of a scene in various 
stages of processing, so that these explicitly 
represented images can interact with each other and 
with higher and lower levels of processing as 
processing proceeds. This is especially pertinent to 
globally consistent relaxation processing. 

• Need investigation of paradigms, other than the 
conventional hierarchical paradigms, such as the 
"blackboard" and other paradigms being explored in 
Artificial Intelligence areas such as expert systems. 



• Faster pro- and post-processing hardware (e.g,, special 
digital circuits fco evaluate intensity gradients). 

E. Knowledge Acquisition - Teaching and Programming 

• Need better techniques fer rapid reprogramming of 
vision systems by the user. 

• Inspection and Assembly vision systems software 
approaches that can easily be modified to adapt to new 
situations. It would be desirable that control 
structures be incorporated that will specify tests fco 
be performed and possible alternate paths of actions. 

• Need high-level programming languages designed 
especially for vision. 

' Need methods for learning and teaching of generic 

types. 

• Methods for knowledge acquisition at all levels. 

F. Sensin g 

• T echniques for rapid 3D sensing — ranging by lidar 

(' scanning laser radar), dci'ocussing, stereo aTTd 
triangulation. ' '"" J ™ 


* Improved methods for use of structured light for 3D 
evaluation and shape identification. 

• Methods for exploiting multiple light sources. 

• Higher-resolution and selective-resolution transducers. 

G . Flanni ng 

* Methods for incorporating vision into robotic planning. 


70 



XVI. Future Trends 


A s the field of computer vision unfolds, we expect to sec 
the following future trends®. 

A. Techniques 

* Though most industrial vision systems have used binary 
representations, we can expect i ncreased use of nra.v 
scales because of their potential for handling scenes 
with cluttered backgrounds and uncontrolled lighting. 

* Recent theoretical work on monocular shn o e 
i nterpretation from image s (shape from shading, 
texture, etc.) make it appear promising that general 
mechanisms for generating spatial observations from 
images will be available within the next 2 to 5 years 
to support general vision systems. 

* Successful techniques (such as stereo end motion 
par ai lax) f or ~ d er i v i ng 3hane and/or motion fro m 
multiple images should also be available within 2 to 5 
years . 

‘ The mathematics of Image Understanding will continue to 
become more sophisticated. 

* Enlargement will continue of the links now growing 
between Image Understanding and Theories of Human 
Vision . 

Brady ( 1 9 8 1 B . p. 11) predicts that there will be a 
considerable advances in current vision "...issues over 
the next few decades, probably resulting in chaufes in 
our conception of computing and vision at leant as 
large as those which have occurred over the past 
decade . " 

B. Hardware and Architecture 

* We are now seeing hardware and software emerging that 
enables real-time operation in simple situations. 
Within the next 3 to 5 years we should see hardware and 
software that will enable similar real-time operation 
for robotics and other activities requiring 
recognition, and position and orientation information . 


^These trends have been largely derived from statements bv Brady 
(1981 A, 1981 B) , Binford (1982), Kruger and Thompson (1981), Agin 
(1980), Arden (1980), Rosenfeld (1981), Hiatt. (1981), and Barrow 
and Tenenbaum (1981). 


71 



Fast rasfcer-basod pipeline preprocessing hardware to 
compute low-level features in local regions of an 
entire sceno are now becoming available and should find 
general use in commercial vision ays tons In 2 to 4 
years . 


* As at virtually all visual levels, processing secras 
inherently parallel, parallel processing is a Have of 
the futuro (but not the entire answer ). Parallel 
processing research hardware systems (suah as ZMOB tit 
the U . of HD, and the H P P for NASA G o d d a r d ) have 
already been built, and appropriate algorithms are 
being developed. 

' Three possible parallel processing architectures are 
array processing, pipeline processing and multi- 
processing. Hul ti-proces3ing looks most p romising as 
it allows data "From sevoral data streams of an image to 
interact with each other to yield a high-level 
representation. 

• Relaxation and constraint analysis techniques are on 
the increase and will be increasingly refloated in 
future architectures. 

C. A. I . and General Vision Systems 

Computer vision will be a key factor in achieving many 
artificial intelligence applications. The goal is to move from 


specia l-purpose v isual process ing to general-purpose computer 
vision. Work to date in model-based systems has made a tentative 


beginning. But the long -run goal is to be able to deal with 
unfamiliar or unexpected input**. Reasoning in terms of generic 


models and reasoning by analogy are two approaches being pursued. 
However, it is anticipated that it will be a decade or more 
before substantial progress will be made. 


*713 computer vision systems move toward this goal, they will 
increasingly incorporate Expert System components using multiple 
knowledge sources. Gevarter (1982B) provides An Overview of 
Expert Systems, in which ACRONYM and VISIONS are considered to be 
examples of Expert Systems^ 





72 



Barrow and Tenoftbsum (1981 , p. 39^ ) indicate that no general 
vision system now exists, but cost of the pieces have been 
experimentally demonstrated. Thus it would not be unreasonable 


to attempt to construct one within the current state-of-the-art . 11 

D. Modeling and Progra mmi ng 

*. Now emerging is 3D modeling, arising largely from 
CAD/CAM technology. 3D CAD/CAM data bases will be 
integrated with industrial vision systems to 
realistically generate synthesized images for matching 
with visual inputs. 

Illumination models, shading and surface property 
models will be increasingly incorporated into visual 
systems. 

* Volumetric models which allow prediction and 
interpretation at the levels of volumes, rather than 
images, will see greater utilization. 

High level vi si on programming languages (such as 
Automatix’s RAIL ) that can be integrated with robot and 
industrial mamrTacturing languages are now beginning to 
appear and will become commonplace within 5 years. 

* Generic representations for amorphous objects (such as 
trees) have been experimentally utilized and should 
become generally available within 5 years, 

E . Knowled ge Acquisition 

* Strategies for indexing into a large database of models 
should be available within the next 2 to 5 years. 

"Training by being told" will supplement "training by 
example" as computer graphics techniques ar.d vision 
programming languages become more common. 


F. Sensing 


A n imports n_L area of. development is 3D sensing. 

Several current industrial vision systems are already 
e TnVToying '"structured light for 3D Sensin g. A TT0T1TtJe7 r 'of 
new innovative techniques in this area are expected to 
appear in the next 5 years. 


More active vision sensors such as lidar are now being 
explored, but are unlikely to find substantial 
industrial application until the last half of this 
decade . 








• A number of other innovative* techniques in 3D sensing 
are now being developed. Anon g these arc the use of 
multiple light sources, multiple views, and shape from 
motion. Some of these techniques may ,seo commercial 
application within the next two years. 

• K ruger and Thompson Mofti, n. ^ 3 ) observe that "By 
fairing several views from particular positions and 
W£TT carefully controlled illumination, Tt is possible 
to separate and independently measure the um erent 
■surface properties." ' industrial vision sys tea's n?r 
iTrsptTCfioh that use this technique will probably appear 
within the next several years. 

• It is anticipated that within two years solid-state 
cameras and convolvers will become available that will 
make stereo machine vision a reality. 

G. Industrial Vision Systems 

• We will see increased use of advanced vision techniques 
in industrial vision systems, including gray 3 cale 
imagery. 

• We are now observing a shortening time lag between 
research advances and their applications in industry. 
It i 3 anticipated that in the future this lag may be as 
little as one to two years. 

4 Advanced electronics hardware at reduced cost is 
increasing the capabilities and speed of industrial 
vision, while simultaneously reducing costs. 

• Because of low 3tart-up costs and the importance of 
vision to industrial and other applications, new 

\ companies and organizations are rapidly entering the 

vision field. 

• It has been estimated that more than 200 companies are 
now playing a role in the vision field. A shakeout 
appears likely as the field settles down, but 
innovation will continue to encourage new entrants. 

" It is anticipated that special lighting and active 
sensing will play an increasing role in industrial 
vision . 

• Better human/machine interfaces simplifying user 

\ reprogramming are now appearing and will become 

i dominant in sophisticated applications within 5 years. 

• Common programming languages and improved interface 

j standards will within the next 3 to 10 years enable 


7‘i 





easier integration cl vie i o n to robots and into the 
industrial environment. 


Future Applications 


It is anticipated that about one quarter of all 
industrial robots will be equipped with some form of 
vision system by 1990. 


Arden (1980, p. 487) observes that "Increasingly, 
computer-vision techniques are being applied to real- 
world problems. This is particularly true of device 
assembly, circuit board layout, and inspection ir. the 
field of industrial automation. Although much of the 
work is 3till going on, several convincing 
demonstration programs have been written, and it is 
expected that computer vision will 3oon begin to have a 
significant impact in industry. At the same time, the 
computer-vision approach will increasingly be applied 
to the analysis of images by computer, areas which up 
to now have been the domain of researchers in pattern 
recognition — for example, the analysis of handwriting, 
photomicrographs and radiographs, and satellite 
imagery." 


I t is likely that in the order of 90f> of all industrial 
i nspection activities requiring vision will be done 
with computer vision systems within the next decade" 


Hew vision system applications in a wide variety of 
areas, as yet unexplored, will begin to appear within 
this decade. An example of such a system might be 
visual traffic monitors at intersections that could 
perceive cars, pedestrians, etc., in motion, and 
control the flow of traffic accordingly. 


Computer vision will play a large role in future 
military applications. The Defense Mapping Agency 
intends to achieve fully automated production for 
mapping, charting and geodesy by 1995, utilizing 
"expert system"-guided computer vision facilities. 
Other future computer vision military applications 
include autonomous navigation and guidance for vehicles 
and missiles, target detection, the interpretation of 
aerial images for general surveilance purposes and for 
local battlefield surveilance. Computer vision will 
also play a large role in future battlefield robots. 



Table X gives Binford's (1982) forecast for computer 
vision system applications. 



75 


Example Future Applications for Coasater Vision Systems 


i i 


: i 


r | 
i i 


9 

9 

e 

o 


o 

o 


[• 


Short term H-? yorrs) 

Industrial Vision Systems 

Cartography; Semi -Automated stereo for terrain mapping 
Mid tern (2-3 years) 

Cartography; Semi -Automated stereo nvapping of complex cultural sites 
Photointerpretation — Monitoring of selected objects in restricted situation 
Long term (3-5 years) 

3D Systems for: 

-warehousing 

-handling unoriented parts 
-inspection of non-laminar parts 

Cartography - automatic feature classification 

Photointerpretation - Automatic classification ov a greater variety of 
objects with greater detail 

Greater than 5 years 

Robotic operations in hazardous environments 
Autonomous navigation 

Vehicle Guidance 
Medical image analysis 
Aids to handicapped 

More than a decade 

Hare robots 

General robotic activities 

Observations of extra-terrestrial bodiesTj 


*Based on Binford (1982) 


76 






I. Conclusion 

In conclusion, the amount of activity and the many 
researchers in the computer vision field suggest that within the 
next 5 to *0 years, u*o should see aoase startling advances in 
practical computer vision, though the availability of practical 
general vision systGQS still remains g long way off. 





* -.REFERENCES 


The following abbreviations 
are used: 


for sosa jcurnyls and conference proceedings 


AI Artificial Intelligence * - 

CG1P Computer Graphics snd Image Processing 

T-PANI IEEE Transactions on Pattern Analysis and Machine Intelligence 

1AAAI Firest Annual National Conference on Artificial Intelligence, 

The American Association for Artificial Intelligence, 

Stanford University, August 1900 

AAAI-82 Proceedings of the National Conference on Artificial Intelligence, 
Univ. of Pittsburg, August 1982 

4ICPR Fourth International Joint Conference on Pattern Recognition, 
Tokyo, November 1978 

3I0CAI Third International Joint Conference on Artificial Intelligence, 
Stanford University, August 1973 

5IJCAI Fifth International Joint Conference on Artificial Intelligence, 
Cambridge, Mass,, August 1977 

6IJCAI Sixth International Joint Conference cn Artificial Intelligence, 
Tokyo, August 1979 

IJCAI-81 Seventh International Joint Conference on Artificial Intelligence, 
August 1981 


78 



omsmi pass fg 

QP PS3R Qt’Af.rrv 


References 

Albus, J., Kent, E. t Neshtun, M. t Misssfcsch, P., er.d Paltry, L. t "A 6-0 
Vision System," Proceedings of S?2E Technical Svssoslua East *C2. Arllnqton. 
VA, May 5-7. ISaTT 


Aggarwal, J. K., Duds, R. 0., and Rossr.feld, A. (cds.). Computer Methods In 
Imago Analysis . IEEE Press 6 Wiley, 1977. 

Agin, G. J. end Blnford, T. 0. "Computer Descriptions of Curved Objects," 

3IJCAI . 1973, pp. G2S-M0. 

Agin, G. 3., "Ccripyter Vision System for Industrial Inspection end Asrcrily," 
Comput er. Kay 1900. 

Arden, B. K.» (cds.), Whet Can 8a Autc-ntod? Ctrcfcrldge: 1SCG, pp. 482-487. 

Saird, H. L., “Sight-1: A Ccnputsr Vision System for Automated IC Chip 
Manufacture," IEEE T rans. Sys. Ren Cvbcrn. , vol. SMC -3, 1973. 

Ballard, D. H., Crown, C. M. , and Fel&sn, J. A., “An Approach to Knwledqe- 
Dlrcctcd Image Analysis," In Henson and Rlsratm 1973s, p?. 271-231. 

Ballard, 0. II. and Erown, C. h’., Computer Vision . Englewood Cliffs: 

Prentice Hall, 1532. 

Barrow, H. G. and Tener.baum, J. H., "HSVS: A System For Reasoning About 
Scenes", SRI AI Center, Tech. Hots 121, March 1976. 

Barrcw, H. G. and Tenenbaum, J. H., "Recovering Intrinsic Scene Characteristics 
from Images,” in Hanson and Rlserasn, 1973a, pp. 3-26. 

Barrcw, K. G., "Artificial IntelligsRce: State-of-tha-Art". SRI Internatlcnal, 
Menlo Park, CA, Tech. Hote 153, October 1979. 

Barrow, H. G. and Tenenbaux., d. H. “Interpreting Line Drawings as 
Three-Dimensional Surfaces," 1AAAI , 1980, op. 11-14. 

Barrow, K. G. and Tenenbaum, J. H., “Computational Vision". Proceedings of 
the IEEE , Vol. 69, No. 5, May 1981, pp. 572-595. “ 

Blnford, T. 0. Inferring Surfaces from Images, Artificial Intelligence, 

Vol. 17(1-3), 1931, pp. 205-244. 

Blnford, T. 0., "Survey of Model-based Image Analysis Systems," 

Robotics Research . Vol. 1, Ho. 1, Spring 1382. 

Bolles, R. C. "Verification Vision Within a Programmable Assembly System," 
AIM-295, STAN-CS-77-591 , Computer Science Dept., Stanford University, 1976. 

Bolles, R. C., “Robust Feature Matching Through Maximal Cliques," 

SPIE, Vol. 182, Imaging Applications for Automated Industrial Inspection 
and" Assembly, 1979. 

Brady, M. , "Computational Approaches to Image Understanding," M.I.T. A. I. 

Memo No. 653, October 1 981 A . (Also In Confuting Surveys, Vol. 14, Ko. 1, Mar. 

1982, FP- 3-H) • 

Brady, H., "The Changing Shape of Computer Vision," Artificial Intell igence , 
17(1-3), 1S31B, pp. 1-15. 


79 


OX 

\v 



• . m llwlM 




Breaks, R., Crelnar, R. t, 
Vision Systcss, Pro;, I nt.' 

Greeks, R., “£yo±al1c Ret 
At 17, «S3i . PP. 2S5-S13. 


r 4 SJafcrf.!, T. 3. , Ti>s ,At£3?.YH tbial-Esscd 
,J.k. _Ca.':f...Arei»fel6l r ^slll ^ncn t£7J ? C, 105-113. 

■ t . •■' 'i '•’■ 1 ■ 

scsir.a fewg 3-s> .sad *2*0 tenses," " "5 


Brooks, T. L. “Sgparvlsci’y fetiptilaUe-r, &8 mS ©*,» «» Ccncupts of 
Absolute vs, Caletlva end Ftead vs. fcvif.j Tcsfei," Frecsrdlncs of ths 
Interngtlciwl R<ysratir Technology Conferr-nso. tpcr.soreoTiy rs;2, 
SaiTm«£!wo,TS^asin!ii!?, «j. “ rasSRi. 


Chin, n. p A‘JtC:r.CiCiHi VidUSi *^2;? iCS-i.'.Tl 
A Bibliography,” Pot to rr liceogtsition . Vol, 

357. “ 


Techniques end Applications 
IS (JJ)., ‘.552, pp. 343- 


Chin, R., Harlow, C. A., Ev-e'sr, 
In Pro;. /.ssrr.bly IV Ccnf. FkRO 

Clovfit, M. B„ *0n Swing Tiling 


S. J., "Auteratsd rrepictlon Techniques, 
Mtlcn (Datroit, HI), fiou. 1277. 

S,° 1971, £12, pp. 73-116. 



Cohon, P. P... and roii;or.bcua, E. A., 
of Artificial IntolIlnmoQ , Vol. SI, 
T? 8 ^ , ppT'TS's • -" 3 « i . 


Chop, mi, Victor., Vha H nn dbook 
Los Altoe, CA: Kouinunn, 


Drspsr, S. H., “Tha Usa of Gradient :r,.1 Cusl Space In Llns-Drawlns 
Interpretation. " A£ 17, tog. (it, pp. 451-EDD. 

Duda, R. 0. end Ksrt, P. £., Pa ttern Rsoor nitjon e n d Seem Analysis, 
Hlloy, 1973. ““ ““ 

Ebsrloln, R. C., "An Iterctlva Gradient Cd?o Doteeticn Algorithm," 
CGIP 5, 1976, pp. 245-253. 

Fougeras. 0., Price, K., "Scruntlc Description cf Airiel tragbs Using 
Stochastic Loballlng", Pros . ARPA Ir;es Uft >rs tending Work; hop , 

Unlv. of Kd., April ISSSTp.^ 

Fenncrw, C. L. and Thompson, H. B.. "Velocity Daterolr.atlcn In Sesnss 
Containing Several Moving Gbjects," CGI? 9. 1973, pp. 301-31*3. 


Fischler, M. A., "On the Representation of of Hatursl Soenos," In 
Conp utor Vlnlon Systems . Henson and Risraen, (Ed.), (1S7ea), 
pp. J'7^51. 

Fiociler, M. A., "Computational Structures for Kaohlno Perception," 
in Advr-pce d Computer Conoont.s , J, C. Solinsky (od.), La Jolla 
Inat., lytTT, pp. AV— "5S. 

Fischler, H. A., end Bellas, R. C,, "Rendon Scnplo Censonoua: A 

psrodign for model fitting with epplioations to lunge analysis and 
outoaeted cartography," CACH , Vol. 24(6), June 1981, pp. 381-395. 

Fischler, H. A., Tenenbaun, J. H., and Wolf, H. C., "Detection of 
roads end linear structures in low-resolution aerial imagery 
using a nultiscurce knowledge integration technique," CGIP , Vol. 
15(3), March 1981, pp. 201-223. 

Flschlor, M. A., Barnard, S. T., Bollca, R. C., Lowry, M., Quan, 

L., Smith, G., and Vitkin, A., "Modeling end Using Physical 
Constraints in Soane Analysis," AAAI-S2 , Aug. 1962, pp. 30-35. 

Garvey, T. D., "Perceptual Strategics for Purposive Vision", 

SRI AI Center Tech. Koto 117, 1976. 

Jarvis, J. F., "Automated Visual Inspection of Printed Wiring Boards by 
Local Pattern Hatching," IEFC Trans. Patt. Anal. Machine Intelligence, 
vol, 2, pp. 77-82, Jan. 19337 

Ginnery, D. B., "Model ling the Environment of an Exploring Vehicle 
by Moans of Stereo Vision," AIM-339, STAN-CS-80-805, Computer Science 
Dept., Stanford University, 1SS0. 

Gennery, et a!., "Computer Vision", JPl Publ. 81-92, llov. 1, 1981. 


80 



?? s ^ PfiGk' B 
QUitUr^ 

Ci^iULl ^££.l£iv, VJoah, D.C., B'epT: Ty 7 YTpb . ui> |l|§§7 * C "~ — 

Gsvarter. a. B., fa Over view ef Artifi cial intelligence sndjefeotjc*, 

Vol. II - {tebo-t 1es\l>iS(Oj^V^«'tot1oi^tfura&u‘o? Standards, Wash., 

DX. , FSrcIi ii'iT/>.. 

Gevorter, H. B., An Overview of Expert Systems, tlSSIR £2-2203, National 
Bureau of StandarilsTT-eih. .'TT.’iT.T iSylidiii. 

Gilbert, A. L. Giles, M. K.. flechs, S. H. Rogers, R. 3., end U.Y.K., -A 
Reel-Tlira Video Tracking System," T-FAMI 0, 1330, pp. 47-23. 

Griffin, n. Q., Cunnlnghra, R. T., end Fskenast. n n-,.-,: 

Guidance cf en Automated Ravin? Vchtcl*," cscar e r <|..» J vi|, w ,. 

MM and Control Conference . Palo Alto, August’lS/tif-^ — ~ 


Henson, A. J. and Kiacnlcr, 11. A., "The DAP, PA/DMA Ibbso Undaretonding 
Testbed, Pro aeodlnr.n of the Imnge Undorntmdins Horkshoo. Stanford 
U., Septn?B2, pp. 3^2-3 dT. e 

Hanson, A. R. and Rises*, n, E. H. (eds.) Co mpute r Vision Systems , ko-j Tcrk« 

Academic Press, 197Ca. 

Hanson, A. R. end Rlscman, E. li. (1978b), "Segmentation of Natural 
Scenes," In Hanson and Rlsesan (1973a), pp. 129-163. 

Hinson, A. R. and Rlseman, E. fl., (107Cc), "VISIONS: A Cc.rputer System for 
Interpreting Scenes," in Hanson end fllseman (1973s), Dp. 303-333. 

Herman, H., Kanede, T., and Kuroe, 3., "Incremental Acquisition of 
a Three-dimensional Scene Model froa Iuasas,” Proc. of the D r.RPA 
I.U. Workshop , Stanford 0., Sept. 1902, pp. 17<PY31:. 

Hiatt, B., "Toward Machines that See," Mosaic , Hov/Oec IS31 , pp. 2-B. 

Hlrzinger, G. and Snyder, U., "Analysis of Tisa-Varylag I&snsry for 
Tracking Moving Objects," Proceedi ngs of the Interna tional Co mpute r 
Technolo gy Conference , sponsored by AsRt,'"Sah"Trancisco, nugusTTSoO, 
pp. 26Tjj . 

Holland, S. W., Rossol, L. and Hard, M. R., "Ccr.slght-I: A Vision 
Controlled Robot System for Transferring Parts from Balt Conveyors," 

Computer Vision and Sensor-Based Robots , G. G. Dodd and L. Rossol, Eds. 

!,ew York: Plenum Vress, 1379, pp. Bi-VOO. 

Horn, B. K. P., "Artificial Intelligence and the Science of Image 
Understanding," In Com puter Vision a nd Sensor-Based Robots, G. G. Dodd 
and L. Rossol (Eds.yTii.Y.:' Plenum tress, VS/9, cp. SIP/TT 

Horn; B. K. P. and Schunck, B. G. "Determining Optical Flow," 

Artificial Intelligence , 1531, 17(1-3). 

Hsleh, Y. Y. and Fu, K. S., "A Method for Automatic 1C Chip Alignment 
and Hire Bonding," In Proc. IEFE Compute r Soc. Coof. Pattern Recognition 
and Image Processing , (Chicago, IL), 197?. 

Hsich, Y. Y. and Fu, K. S., "An Automatic Visual Inspection System 
for Integrated Circuit Chips," Ccmput. Graph Image Processing, vol. 14, 

1980, pp. 293-343. 

Huffman, D. A., "Impossible Objects as Nonsense Sentences," In Machine 
Intelligence 6. E. Heltzer and D. Hlchie (eds.), Italsted Press, 1971, 
pp. 295-323. (Also in Aggarwal et a1_. (1977), pp. 333-366,) 

Ikeuchi, K. and Horn, B. K. P., "Kumerfcal Shape from Shading and 
Occluding Boundaries," Artificial In telligence , 1931, 17. 


81 


Jervis, J. F., n A cetfcoiS fer sutonatias the visual inspection of 
printed wiring boards, in Fr'oe. SXfrt.-i.T tG Sesiner Pattern 
Recognition (Unlv. Loigs, Sarc-Teisssn, EcIs"iunj r , _ T!ov. T§77. 


torsade, T., "riedsl Represents ticr.o end Control Structures in Itsaqe 
Understanding,* Proc. of IJCA1-77. Csriridga, teg. 1S77, 1074-1CS2 


& Kensde, 7., “Recovery of The Three-Dimensional Shape of An Object frea a 
M Single View." M 17, fog. SI, p P . 403-469. . 

Karhloka, S., FJiri, H. , and Sakamoto, Y., »A Transistor Wire Bonding 
System Utilizing Multiple local Pattern fechine." IEEE Trans. Svst. 
tor. Cybfrn ., vol. SM.C-8, 1976. 1 

Kelly, li.D. , “Edge Detection in Pictures by Computer Using Plsnr.lrq," 
in Machine Intelligence s. D. fteltzcr end D. iiichie (cds.), touted' 

Press, 1971, pp. 337-4 09. (Also in Aggansal et al_. (1377), pp. 220-244.) 


Krakauer, L. and Pavlidis, T., “Visual Printed Hiring Beard Fault 
Detection by a Geometrical Method,* in Proc . COMSAC (Chicago, II). 

»;ov. 1979, FP- 260-265. 

Kruger, R. P. and Thompson, U. .B., "A Technical and Economic Assessment r 
of Computer Vision for Industrial Inspection and Robotic Assembly, * I 
Proceedings of the IEEE . Vol. 6?. to. 12, Dec. 1931, pp. 1524-1533. ■> 

Lendgrebe, D. A., "Analysis Technology for Land Remote Sensing,* 

Prcc. of the IEEE , Vol. 69, Ik). 5, toy 1931, pp. 623-642. 

Levine, M., *A Knowledge-Eased Computer Vision System*, in Computer 
Vision Systems. Hanson, A., Riscsan, £., eds.. Academic Press, 
neSTVorkrTy/tL 

Lowe, D. G.. and Binford, T. 0., "The Interpretation of Geometric 
Structure from Image Boundaries," Free. Irsoe Understanding forks hop , 
ed., Lee Baumann S., 39-45, 1931. 

tocYicar-Whelan, P. 0., end Binford, T. 0., "Lins Finding Hith Subpixel 
Precision," Proc. Image Understanding Kurksheo, ed., Lee Bauaann S, 25-31, 
1931. * 


Harr, D., "Representing Visual Information," in Computer V ision S ystems , 
A. Hanson and E. H. Rlseman, Eds.; Hew York: AcsdcmicTrcss, 1973 , 

pp. 61-80. 


Harr, D. C., Vision . San Francisco: W. H. Freeman, 1982. 

torr, D. and Kishihara, H., "Visual Information Processing: Artificial 
intelligence and the Sensorium of Sight," Technology Review, October 1978. 
pp. 28-47. 

torr, D. and Hildreth, E. "Theory of Edge Detection," AI Hemo Ho. 518, 
Artificial Intelligence Laboratory, toss. Institute of Technology, 

April 1979. 

Hartelli, A. "An Application of Heuristic Search Methods to Edge and 
Contour Detection," CACH 19, 1976, pp. 78-83. (Also in Aggarwal et al. 
(1977), pp. 217-227.1 


Minsky, M. L., "A Framework for Representing Knowledge," In. P. 
H. Winston (Ed), The Psychology of Computer Vision. Hew York: 
McGraw-Hill, igT^pp. 5Vr^27T. 

Nagao, H. , Hatsuyema, r., Ikcda, Y., "Region Extraction and Shape Analysis 
of Aerial Photographs", Proc 4ICPR . 1978, p. 620. 

Nagao, H., Matsuyama, T., A Structural Analysis of Complex Aerial 
Photographs , Plenum, 1 930. 

Nagel . R. N., Var.derbrug, G. J., Albus, 0. 5., and Lowenfeld. E., 
"Expennants in Part Acquisition Using Robot Vision," SHE Tech. Paper 
KS79-784, 1979. 

Nevatia, R. and Babu, K. R., "Linear Feature Extraction and Description," 
6IJCA1 , 1979, pp. 639-641. 









Usvatla. R., tkOilk- ..-/Jv, , Er.slw.cJ Cliffs, HJ: Prentice-Hall, 


Hltian, 0*£ R.oss{t,"CV,' r (*: 5 ln, G., Colics, R., Gleason, G., Hill, J., 
KcGhlss 'IW.-PraJsi'*, R. »ptrk, H., sad Sward, A., "KscMne Intelligence 
Research Applied to Industrial Aatetattor.** Oth Report, SRI International, 
August, 1S79. 


tludd, G. R., "lease Understanding Architaaturca Prou. of the 
National Computer Conf ., 1SG0, pp. 377-306. ~ ” 


Ghta, ¥., "A Region-Oriented Itrage-Aralysis System by Computer", Thesis 
Dept, of Information Science, Kyoto University, 1SS0. 

Parra, C. C., Henson, A. M., Rirman, E. H., “Expert wits in Schsra-Drtven 
Interpretation of a Natural Scone", Ur.1v. of llass. CO IKS Tech. Kept. E0-10, 
1920. 

Perkins, M. A. Tfidel-Essed Vision System for Industrial Parts,* . 

IEEE , Trans . Comsat. , vol. C-27, 1978. 

Pinkr.ay, H. F. L., "Theory and Development of en On-Line 30 He Video 
Photogr.-irretry System for Real-Time 3-D1msnoionel Central," Proceedings 
of the ISP Symposium on Phatogremmatrv for Industry . Stccfchaiis, August. 1570. 

Pratt, H. It. , Digital Image Processing , Hew Vork: Wiley, 1373, pp. S63-SS7. 


Prewitt, J. H.S., "Object Enhar.sentnt nnd Extraction," in Picture 
Processing end Pay ch o p ichor ica (D. S. Lipltin and A. RosenielH, ads.) 
New" Sork: Academic Press, V570, pp. 75-149. 

Reddy, R. and Newell, A., "Image Understanding: Potential Research 
Approaches.* ARPA lmaqe Understanding Workshop, Washington. DC. 

1075. “ 

Roach, 3. VI. and Agqarvral , J. K. , “Computer Tracking of Objects Waving 
in Space," T-PAHI 1, 1979, pp. 127-135. 

Roberts, L. G., 'Machine Perception of Three-DIminslonal Solids,* In 
Optical and Elect r sptlc al Information Processing , J. T. Tlppitt et al., 

Eos., Cambridge, Press , 1SS5~ 

Rosenfeld. A., "Image Pattern Recognition", Proceedings of the IEEE , 

Vol. 69, rio. 5, flay 1981, pp. 595-605. 

Rosenfeld, A., "Picture Processing, ’SSI, "Computer Vision Lcb Rept. 

TR-1134, U of 12), College Park, Jan. 1S32. {Also In CGIP, Hay 1982.) 

Rubin, S., "The ARGOS Inage Understanding System", Proc ARPA IU 
Workshop . Nov. 1973, also "The ARGOS Image UndersUn3Tr.a _ 5ysten", 

Thesis, Carnegle-Hellon University, 1573. 

Sarrls, V., "The Development of Robot Vision", Charles River Associates, 

Boston, Hass., 1932. 

Saund, E., Ger.nery, 0. B., end Cunningham, R. T. (1531), "Visual 
Tracking in Stereo," Joint Automatic C ontrol Conference , sponsored 
by ASHE, University of Virginia, Jur.e~T93) . " 




tim 


P 


■fi 

r *k£. v« 




gf peea quauiy 


Shapiro. L. C. t Kcriarty, i. 8.. Ktfssiaksr, ?• 3.. end Kersllck, R. K. 
“Sticks, Matos, end B Jets : A Yhrca-t'IsxawlsSRal Cbjsct Koprwentoticn 
for Scena Analysis," 1AA5I,, 1CS3 pp. £3-£0. 

Shlrsl, ¥., "Analyzing Intensity Arrays Using Bawled;# About Scenes** 
In U1r.it on (1375), pp. SS-1U. 


I 


Shirs! , V., “Rcccsnltier. of Peal -Mario Objects Using Edgtr Cu?., a In 
Henson end Rlftosn (lS7Cs), pp. 253-772. 


Stevens, K. A., "The Vlsusl IntarpratEticn of Surfr.ee Contours, r At 17, 
August Cl. pp. <57-74. “ 

Tennbawi, et el.. "Prcspsets far Industrial Vision", in Ctroutcr 
Vision rr.d Sssror-Srse-d fetSats*, G. G. Dadd end L. ncssol“CfcSTT)T 
MVn-Tonua PrSijTVsVVr i$7V»-»9. 

Tsugeea, S., Yatstee T., Illresa, T., find Hitscnoto, S.* "An Autcssblls 
with Artificial Intelligence," I.V.AI, 1973 , pp. 633 - 695 . 

Ullocn, S., The Interpretation of Visual Notio n. Ccsrbridj-a, Mass* 

HIT Pi'S!*, 1 ¥i<). 


\tmz t T., Gather, H. e and K-sro, L., "A Enswledga-Sajed Inti recti va 
Robot-Vision System," 3IJGAI, 1S78, pp. S20-S22. 

Uslti, 0. L., "Gmaretlr.a Crr^ntlc Dsscrlptlons frea Drawings of 

Scares with Shsdcxs", H.f.T. Tech. Rspt. AI-TA-271, Cambrldgo, HA, Nov. 1972. 

Waltz, D., "Understanding Lino Brewings of Scans* *rtth Shsdows," In 
Uimtcn (1975), pp. 19-21. 

‘.Jaslcy. H. A., Lczer.o-?eraz, T., Llbsicnsr., L. Levin, H. A., and 
Grossman, D. D., “A Geometric Red-sling System for Autecstsd ftechanleal 
Assembly," I CM Journal of Research and Scvclor-mant 24. 1939, pp. 64-74. 

Kinston, P. H. (cd.). The Psychology of Ccmnutrr Vision . IfcCraw-HIll, 1575. 

t'c9dha.ii. R. J. , "Analyzing Curved Surfaces Using Reflectance f-isp Techniques," 
In Artificial In tellig ence: An HIT Perspective, P. H. Ulnitcn and 
R. ffTTre-jnTocs . ) , mTTPres sT^ 575,~p [77^ ST7TL2 . 

Moodham, R. J., "Analyzing Images of Curved Surfacas," M 17, August 81, 


Yachlda, H. and Tsujl, S., "A Varsatlla Machine Vision System for Ccmplox 
Industrial Parts," IKES Trans. Comnut. . vol. C-2S, 1977. 


64 


i 

■J 

» ! ■ «< — J 





APPKriDIX A - 5 
LOW LEVEL FEATURES 


The scene to be analysed is usually sensed by a digital 
camera or other similar device, the output of which is normally a 
digitized image having an array of brightness values. For some 
purposes these brightness values can be operated upon directly to 
obtain desired information abcut the scene, but it is usual to 
extract low level features for further computer processing. The 
following sections describe the low level features usually 
considered for extraction. 

A. Pixel3 (Picture Elements) 

Pixels arc the individual elements in a digitized array. 
They usually represent brightness and perhaps color in a 
projection from a three dimensional scene, but could also 
re present distance in a r ange image. 

B. Texture 

Texture is a local variation in pixel values that repeats in 
a regular or random way across a portion of an image or object. 
Texture car. sometimes be used to identify the object being 
sensed, or it can be used for approximating range and surface 
orientation in a known object. However it can also be a noise 
source in processing the image. 

C. Regions 

A region is a set of connected pixels that show a common 
property such as average gray level, color or texture in an 
image . 

6 Til is appendix i3 based largely on Gennery et al. (1901). 


86 





J79J: ss'Twpap'M * 


D. Edges and Lines 

An edge is a step in pixel values (exceeding some threshold) 
bet ween two regions of relatively uniform values. A l ine is 
defined as a thin region of roughly uniform pixel values between 
t wo regions of different but roughly equal pixel values. Line 
r epresentations are extracted f rom edges . 

E. Corners 

A corner is an abrupt change in direction of a curve. 
Corners are useful in data compression approaches to representing 
straight edges, and a3 points for feature matching. 


c - zi 


87 


APPENDIX B 


EXTRACTING EDGES AMD AREAS 

A natural fir st step In analyzing n scene Is to convert I t 
I nto u sketch, that is. find the «rigo?; t-.hn t .sonaratfe regions of 
d iffering brightnesses . Edges correspond to abrupt changes in 
brightness. Such changes can be identified ns pl aces where the 
first derivative of the brightnes s is suddenly high or th e second 
derivative is zero (soo Figure 8). There are various schemes for 
doing this, all in some way related to taking brightness 
differences between adjacent points. 

A. Extracting Edges 

The basic methods for extracting edge and line elements from 
images are 0 : 

1 • Linear Matched F i Iter lng : 

Successively convolve 00 image windows with a template of the 
desired feature and seek the maximum value. 

2 . Hon-1 inenr Fil t er in g : 

Convolve windows in the image with a local operator 
(weighting function that approximates first or second derivatives 
by first or second differences). Examples of operators for doing 
thi3 are shown in Table I. In general, each point in the image 
is convolved with directional operators in as many directions as 


tt Thi s section ITS ba-sed largely on methods described in Roscnfeld 
(198', p. 601), Connery, et al., (1981, pp. 2-8 to 2-14), and 
Brady ( 1 98 1 A ) . Add i t ionn 1 material can be found in Ballard and 
Brown (1902), Binford (1981), and Nevatia (1982). 

“^Convolve means superimposing a nxn operator over a nxn pixel 
area( window) in the image, multiplying corresponding points 
together and summing the result. 



OWfflWM- 

QJ- pOO*' 


pfnGS * S 


Intensity: I 




89 



. %CS" 


• .* : f. s j , . , 

nsedad. Tho rosuXtant outputs at ocoh point arc eaBbinodto 
datorraine tho gradient vootor (tho orientation and tacQnitude of 
tha intensity change:.). 


90 





1 f r'- J\ 

•' ' • J'-fJ- 1 :;•: 0 -, i, , 

Tab la 1 


s. L liast-LLimu-. LUJijsjliulil 1st, s. KslxiP-LLafi Ms la 
&£& him. Blsssszis. 


h pproaoh 

I • JEiLaa _Qp.or .ti. t or n 


Detoot first derivative 
o £ brightness: 3£ 

Sobol Oporatorc 



1 

o 

o 

-100 

o 

o 

I 

-100 

o 

o 

1 

-100 

-100 

-100 

-100 

-100 


100 

100 

100 

100 


100 

100 

100 

100 



"Nevatla and Babu Operators 




Edge Crttartc 


p 1 a o 1 a which 

operator c 

yiald an . 

timed for 

j ValMCH 

! 

] lulled 
range of 
oporat Ion 

i 

Tor each pisel, 
find anglo opar- 
etoro that yield 

similar operator:: 

ntelauu value. 

used for oaoh 

Than thin, 
thranhold and 
l inh < lino f 1 1 > . 

30° angle. 



'Convolving a imago window (about a pisol) with operators such as 
those indicated. Onatttocv shown are for finding vortical l ). no a. 


ai 


























page'- gs 

3. Local Thresholding C-' Cpi&SfU 

Apply local thresholding ?.nd diaeard responses that do not 
lie on borders (between upper and lower threshold regions) and 
link responses that do. 

4. Surface Fitting - T he Hueckel Operato r 

Fit a surface to neighborhood of each pixel and compute 
maximum gradient of the surface. Consider as edge points those 
pixels having surface maximum gradients above a selected 
threshold value. This approach was first devised by Prewitt 
(1970). The Hueckel Operator is a popular method for doing thi3. 

5. Rotationally Insensitive Operators : 

•v -/'■•T* 

The Laplacian Operator ( 7 X <* ~ * -t- , related to the 
magnitude of the derivative of the intensity gradient) is 
insensitive to the direction of a lino and yields edge elements 
at pixel points where the Laplacian is zero. Thus discrete 
approximations to the Laplacian have proved useful in line 
finding . 

6. Line Following : 

Shirai (1975) devised a line following method that used a 
pair of parameters that varied according to how continuously and 
smoothly elements were found. These parameters determined 
thresholds for accepting a new element according to how close it 
was to the linear continuation of the current line being tracked. 

7. Global Methods 


Martell i (1976) devised a global hueristic search that 
operates directly on the brightness values. A cost function 
is optimized depending on the curvature of the candidate 


92 



line and tho degree to which the candidate line succeeds in 
dividing the image into regions of different brightnesses. 
Kelly (1971) used a hierarchical refinement approach, firat 
finding lines in a coarse image end U3ing the results to 
guide line finding in a higher resolution image. 

Eberlein (1976) utilized a relaxation approach for linking 
edges found by a local detector, depending on how the edges 
agreed with their local neighbors. This was a parallel 
method that merged the elements into a continuous line. 

The H ough Transform (Duda and Hart, 1973), is a global 
parallel method for finding straight or curved lines. For a 
straight line, using results from local edge detectors, the 
perpendicular distance (p) from the line element to the 
origin and the angle (6) of' the normal to the line is 
determined and mapped into (p,0) space. Peak clusters in 
(p , <?) space are considered to be straight lines. 

Fischler, Tenenbaum and Wolf (1981) describe a new paradigm 
for detectingaocipreci3ely deliniating roads and similar 
"line-like" structures appearing in low-resolution aerial 
imagery: The approach combines "local information from 

multiple, and possibly incommensurate, sources, including 
various line and edge detection operators, map knowledge 
about the likely path of roads through an image, and generic 
knowledge about roads (e.g. connectivity, curvature, and 
width constraints). The final interpretation of the scene 
is achieved by using either a graph search or dynamic 
programming techniques to optimize a global figure of 
merit." 


93 



B. Ed 3 a Fin ding Variations 

w«»* -■ t— ■ i ■■». ■* II— I » nr ■■■ n MMMMfag 

There arc* many approaches which can bo considered to be 
variations, ccEbinationo and estansiona of the basic approaches 
to edge finding considered in section A. 

For example, Harr and Hildreth (1979) utilised the fact that 
different edges are found depending upon feha aizG of the edge 
masks. They also observed that bar masks seen to give mere 
reliable information than edge macks. They used bar masks of 
different panel widths and combined their outputs to reduce 
effects of noise and to compute the fuzziness of an edge. They 
extended this method based on their observations that intensity 
changes ore localized in space and in (spatial) frequency. They 
note that using a Gaussian filter 9 optimized localization in both 
domains simultaneously. They thus convolved the original image 
with the Laplacian of the Gaussian smoothing filter for each 
spatial frequency used. Edges were considered to occur where 
zero crossings from several spatial frequency channels concurred. 

C. Linking Edge Elements . and Thinning Resultant Lines 

Due to imperfections in edge element finding techniques, 
situations where edges are poorly defined and noise in the image, 
the primal sketch will usually consist of discontinuous and 
somewhat scattered edge elements. Various schemes (heuristics) 
exist to connect these edge elements together to form lines. 


b An averaging procedure about a pixel in which the influence of 
neighboring pixels fall off with distance, according to a 
Gaussian distribution. 


94 



- - «rs . 

Long edge - 3 or lines cen be found either by using an edge 
detector (as discussed -In the previous section) and linking the 
resultant edge elements into a long smooth curve (filling in gap3 
and ignoring stray elements), or by a procedure which 
accomplishes a similar result by operating directly on the image 

data. In either case, if the algorithm operates sequentially by 
proceeding along the curve an it links edge elements or pixels, 
it often i 3 called a line follower (or tracker), edge follower, 
or curve follower. However algorithms have also been devised 
that operate on an effectively parallel or gestalt basis. 

Eberlein's (1976) relaxation method yields a thin line 
naturally upon convergence. Kevatia and Babu’s (1979) approach 
accepts as edge portions those candidate edge elements found 
that have a maximal gradient value compared to adjacent pixels 
with a similar gradient orientation. 

When deriving curves from edge data, it i3 often d esirabl e 
to thin the- resulting contou rs. Thinning methods reduce the 
co ntours to a sing l e-pixel width by dis carding redundant edges 
while maintaining the continuity of tie contours . Some methods 
such as Eberlein’3 or Novatia and Babu's include thinning as an 
inherent part of their operation. 

D. Remarks on Edge F inding 

Binford (1981) states that it i? important to distinguish 
between detection of an intensity change and its subsequent 
localization. Thus, he considers the zero crossing of the second 
derivative of the intensity good for localization of feature 
points but not for detection; while the maximum of. the first 
derivative is good for detection, but not for localization. 
Combining the two effects and using linear interpolation, 



HcoVicur-VJholcsn end Qinforti 0921) report being able to localize 
edges to sub~plr.cl accuracy., 

Connory, et al., 0981) state that for poor quality images, 
the performance of all the various detectors degrade, but in 
different ways. Mono can be considered to bo the last word ir. 
edge detectors. 

E . Extractin g Regions 

Many of the edge finding approaches are designed to perform 
best when the edges can bo approximated reasonably by a series of 
linkad straight lines. In natural secnos, this approach can lead 
to difficulties. 

An alternative approach to edge finding is to partition an 
image into regions of approximately uniform brightness 
corresponding to surfaces. Unlike edge linking, "region growing" 
does not require the assumption that the boundaries are straight. 
Region growing can be accomplished by initially partitioning the 
image into elementary regions of constant brightness, and then 
successively merging adjacent regions having sufficiently small 
brightness differences, until only boundaries with strong 
contrast remain. The merging can be done somewhat in parallel by 
computing merge merits for all pairs of adjacent regions, and 
merging all pair3 that have mutually highest merit. Another 
advantage of region growing over edge finding is that this 
technique generalizes more readily to characteristics other 
than brightness, such a3 texture, color, size and shape, which 
are important in natural scenes. 

The simplest vision systems use a global threshold to obtain 
a binary image - an approach commonly used in industrial vision 


96 



systems. Thresholding can also be local or even dynamic. When 
histograms 8 of image pi::cl intensities are used, it is usual to 
dissect the image by thresholding at a value in a valley of the 
histogram 30 as to give strong peaks on either side of the 

threshold value. This region splitting approach can be applied 
recursively until no more regions can be split. Ohlander et al. 
(1978) used this approach, computing histograms in each of nine 
colors and thresholding on the parameter that yielded the best 
histogram for splitting. 

NASA has employed spectral analysis for segmenting 
regions in LANDSAT imagery (c.f., Landgrebe, 1981). 


♦Frequency counts of the occurrence of each intensity in an 
image . 


97 


A. 


Tho Computer Vi?,. ion Paradigm 

Starting with an imago of a scene, the goal of a computer 
vision system is to identify the objects and their relationships 
in the scene. To accomplish this, it is customary for the system 
to segment the image into surfaces cr edges associated with the 
objects, and then U3e the resulting information, together with 
domain knowledge, to generate the desired scene description. In 
this appendix we will review techniques used to do this 
segmentation. 

D. An Early Bottom-up System 

A landmark program in machine perception w as developed by 
Roberts (1965) to recognize various three dimensional polyhedral 
object configurations. Roberts employed an image-di3sector 
camera to look at blocks-world scenes involving blocks, wedges, 
hexagonal prisms, or objects formed by sticking these together. 
His program could determine the location, orientation, and 
dimensions of the objects. The program could demonstrate its 
"understanding," by displaying a drawing of the scene observed 
from any desired viewpoint. 

Roberts' program first found the places in the images where 
brightness or shading changed abruptly, corresponding to points 
on the edges of the object. Then by linking these points, it 
produced a line drawing of the scone. The line drawing was 
interpreted by finding triangles, quadrilaterals, and hexagons, 
which suggested possible objects (triangles suggest wedges, 
etc.) and eventually accounted for all the lines and junctions as 

93 





edges and corners of objects. From the resulting appearance of 
the object in the image, the program was able to compute its 
dimensions, location, and orientation. 

C. Problems with Bottom-Up Systems 

Barrow and Tenenbaum (1981, p. 570) note that a major 
problem with sequential program organization used by Roberts and 
many of his successors: 

...is the inherent unreliability of segmentation. 

Some surface boundaries may be missed because the 
contrast across them is low, while shadows, 
reflections, and markings may introduce extra lines and 
regions. The interpretation phase, when presented with 
a corrupted segmentation, may be unable to produce an 
explanation, and hence cause the entire system to fail. 

Partitioning an arbitrary image into regions 
corresponding to objects or object surfaces is 
fundamentally impossible without exploiting scene 
models. First, there is no basis for deciding which 
image features are significant at the level of objects 
and which are not. Second, there is no good pictorial 
criterion for filling in missing features. Third, the 
very notice of an object is ill defined, being largely 
determined by convention and experience. 

D . Interpretation-Guided Segmentation 

Several research teams tried to overcome these problems jy 
integrating the segmentation and interpretation phases. One 
simple approach used was to try to recognize objects from partia- 
matches obtained using models and then to try to verify the 
results by attempting to find evidence that supported image 
features previously missed. 

Techniques were also developed for region-based systems. The 
general approach being: 

1. For regions with uniform attributes such as intensity, 
color or texture, assign sets of possible object 




interpretations based on knowledge of possible object 
surfaces and the contextual eontraints associated with 
assignments in adjacent regions. For instance, a road 
cannot be surrounded by sky. 

2. Merge adjacent regions with comparable interpretations. 

3. Reevaluate interpretations based on contextual 
constraints associated with the new adjacent regions. 

4. Continue alternating merging and interpreting until all 
adjacent regions have disjoint interpretations 
unviolated by contextual constraints. 

Both line-based and region-based interpretation-guided 
segmentation systems have been devised that have performed well 
in a variety of complex scene domains. However, the approach is 
not suitable for a general-purpose vision system as it depends on 
prior knowledge of expected objects. Unknown objects cannot be 
recognized or even described. Thus for unknown objects, levels 
of scene descriptions below the level of complete objects are 
needed. 

E. Use of General Vforld Knowledge to Guide Segmentation 

Marr and Nishira (1978) observe that as the primal sketch i3 
typically a large and unwieldy collection of data, the next step 
is to decode it--traditi«nally by ” ... a process called 
segmentation whose purpose is to divide a primal sketch, or more 
generally an image, into regions that are meaningful, perhaps as 
physical objects.” It makes sense to use any general knowledge 
that might help in the interpretation. An example of such 




knowledge is information on the physical nature of edges of 
objects. 

Huffman (1971) and Clowes (1971) devised an approach to 
enable the interpretation of perfect line drawings of polyhedral 
objects without having to resort to heuristics. They recognized 
that each line in the picture represented either a convex edge, 
a concave edge, or an occluding edge in a three-dimensional 
scene. From this, they constructed a catalog of possible vertices 
with allowable line labellings. A 3cene could then be analyzed 
by starting at one vertex and proceeding through the line drawing 
performing a tree search, limiting the number of possible line 
labellings at each step according to the catalog, until a 
consistent labelling for the entire 3cene is obtained. Waltz 
(1975) further extended this technique to include shadows and 
cracks. His catalog included several thousand possible vertex 
types. He used a relaxation-type procedure to decide on the 
correct labelling for each line according to the possibilities in 
the catalog. The resultant procedure converges rapidly (usually 
to a unique interpretation) regardless of the complexity of the 
scene. 

A move toward a more general approach to the problem of 
interpreting feature point segments as lines and edges has 
recently been made by Binford (1981) and Lowe and Binford (1981). 
In their scheme, a segment is interpreted as a space curve, and 
constraints are formulated based on coincidence, and those 
situations in which a curve corresponds 
bounding contour. 


to a true edge or 



A comprehensive approach to deriving a physics! sketch of a 
scone from one or more images has frsoa taken by Fisohler efc al. 
(1982). They use a priori knowledge of global and extended 
constraints to guide the segmentation and interpretation process. 
Their approach involves modeling physically meaningful information 
such as the imaging process, the soene geometry and elomcnts of 
the scene content. They utilize knowledge about such factors as 
the camera model, vanishing points, geometric distortion, ground 
plane, geometric horizon, skyline, semantic context (urban or 
rural scene, etc.) physical surface models end edge 
classification . 





APPENDIX D 

2-D REPRESENTATION, DESCRIPTION AND RECOGNITION 
Thi3 appendix presents a number of 2-D representations and 
descriptions useful for further processing and recognition. 

A. Pyramids 

A pyramid data structure represents an imago at several 
levels of resolution simultaneously. The base of the pyramid is 
the original full resolution irauge, usually assumed to be a n it n 
square array. The next level of the pyramid is typically formed 
by partioning the image into non-overlapping 2 by 2 cells and 
mapping (usually by average gray level) the four pixels in each 
cell to a single pixel in the next level 0 . This is repeated, 
level by level, until the image is compressed into a single pixel 
at the top level. The usefulness of pyramids lies in being able 
to extract features at an appropriate level of resolution. 

D. Quadtrees 

A quadtree representation of a n x n iranse is obtained in a 
top-down manner by recursively splitting the image into 
quadrants, the quadrants into subquadrants, etc. The process 
continues until all pixels in a quadrant are uniform with respect 
to some feature (such as gray level). The terminal leaves of a 
quadtree are uniform regions of varying sires, thus being a 
useful first phase in segmenting a image into regions. 

C. Statistical Features of £ Region 

Once an image has been segmented, a description of each 
region, or blob, can be generated as a list of statistical 

‘father paTtTon irfgs an d’ mappings are common. 


103 




i 


features. These- features typically include perimeter, area, c.g., 
first and second order moments, color, etc. The individual blob 
descriptors are linked to form a tree data structure which 
represents nesting relationships. The parent of any blob in the 
tree is the adjacent blob which completely surrounds it. 
Recognition is performed by matching the statistical features 
with those of stored prototypes. The SRI Vision Module and GM'3 
COHSIGHT use this approach. 

D. Boundary Curves 

The boundary of a region can be represented by a chain of 
straight lines and arcs. The resulting compressed boundary 
descriptions are sometime referred to as "chain codes" or 
"concur ves." Gennery et al. (1981, p. 3-2) note that "The main 
advantage of the concurve representation is that objects may be 
recognized on the basis of partial views by matching a subset of 
the lines and arcs in a model concurve with the image data." 

E . Run-Len? th Encoding 

For a binary image, it is possible to segment the image into 
edges and regions by sequentially scanning the image and 
recording the edge points (where pixels change from zero to one 
or vica versa). This process of reducing a binary image to a set 
of edge points is called run-length encoding, and has been 
successfully used in the SRI Vision Module and a number of 
sophisticated commerical vision systems derived from that module. 

F . Skeleton Representations and Generalized Ribbons 

In this approach, a planar region is represented by a 
skeleton which consists of the medial line (locus of points 


104 





equidistant from the boundaries of the region) and the 
perpendicular distance from the boundary for each point on the 
medial line. In some cases, a complex region can be constructed 
as the union of these generalized ribbons (the 2-D version of 
generalized cones described in Appendix F ) . 

G. Representation by a Concatenation of Primitive Forms 

A region can be built up from a collection of squares, 
rectangles or other shapes. The "Maximal Block" approach uses a 
union of squares of various sizes. 

H. Relational Graphs 

An image that has been segmented into regions can be 
described in terms of a relational graph, whose nodes represent 
regions and whose arcs represent properties (such as shape and 
size) and relations (such as "infront of" and "adjacent to"). 
Corresponding views of known objects can be similarly 
represented, and recognition can be achieved by matching the 
graphs . 

I . Recognition 

Recognition consists of matching a description derived from 
an image to a description of a stored model. Recognition can be 
accomplished by correlation, which for binary data reduces to 
template matching. A more elaborate approach is statistical 
pattern classification using features such as described in 
Section C. Relaxation and syntactic analysis approaches 
(described elsewhere) have also been used. Fischler and Bolles 
(1982) suggest "random sample consensus" as a paradigm for 
selecting the model that provides the best match to the data and 
for computing the best values of the free parameters. 


105 



APPStiSIX 5 

RECOVERY OF IHTBIKSIC SURFACE CHARACTERISTICS 

A. Baslo Approach 

As indicated earlier, it is helpful, in many coses to assist 
in finding 3-D surfaces and volumes for interpretation, to go 
beyond the 2-D representation of edge 3 and regions to a 
representation proposed by Harr (1978) of HIT, called the 2.5-D 
ske tch consis ting of surface distances ond or ientations. Such a 
3ketch can be constructed from the surface characteristics which 
are intrinsic to the scene and are not dependent upon 
idiosyncracies of viewpoint of the sensor. 

Barrow and Tenenbaum (1981, pp. 581-582) indicate that the3o 

i n t r i n s 1 o c _h.a r-a et-cr-ls. t-i.es of surfacoa are appropriately 

repreaente d as a got of arrays in registration with the image 
array. Each array corresponds to a particular intrinsi c 
characteristic such as surface reflectance, surface orientation, 


incident illumination and renge^— Each array contains value3 for 
its intrinsic characteristic at the surface element vi3ible at 
the corresponding point in the sensed imago. It also explicitly 
indicates boundaries due to discontinuties in value or gradient 
of the characteristic. Such arrays have been referred to as 
intrinsic images. 

Figure 9 is an artist'3 conception of one possible set of 
intrinsic images, corresponding to a monochrome image of a simple 
scene. The images are shown as line drawings, but :.n fact would 
contain values at every point. The solid lines represent 
discontinuities in the scene characteristic; the dashed lines 


106 


represent, discontinuities in its derivative* The distance image 
gives the line of sight range from the oontor of projection to 
each visible point in the scene. The reflectance image gives the 
albedo (the ratio of total reflected to total incident 
illumination) at each point. The orientation image consists of 
vectors representing the direction of the surface normal at every 
point. The integrated incident illumination from all sources is 
given by the illumination image. 












The central pr obi era In recovering the intrinsic 


c haracteristics from the image is that the desired information is 
confounded in the sensory data. The observed light intensity at 
a single point could result from sr. infinitude of combinations of 
illumination, reflectance and orientation. The key to recovery 
lies in exploiting constraints derived from assumptions about; 
the nature of the scene and the physics of the imaging process. 
For example, as surfaces are continuous except at boundaries, we 
can expect surface characteristics (reflection, orientation and 
range) to also be continuous. Similarly, incident illumination 
also varies smoothly over a scene except at shadow boundaries. 

Barrow and Tenenbauro 0981, p. 589) propose the following 
four-step model for using interacting constraints in a 
relaxation typo process for simultaneously recovering the primary 
intrinsic characteristics from a brightness image: 

1) find the brightness discontinuities in the input 
image ; 

2) cetermine the physical nature of the 
discontinuity ; 

3) assign boundary values for intrinsic eharacteris- 
tics along the edges, based on the physical 
interpretation; 

4) propagate from these boundary values into the 
interiors of regions, using continuity 
assumptions . 

Many different approaches to recover shape from image 
characteristics have been explored as represented by the 
following sections. 

B. Shape from Shading 

Barrow and Tenenbaum (1978) describe a low level method of 
estimating relative distance and surface orientation from a 


109 




single image. They use heuristics based or. the rate of change of 
brightness across the image. 

r- 

I keuchi and Horn (1931) h ave formulated a second order 
differential equation which Horn calls the "image irradiance 
equation." This equation relates the orientation of the local 
surface normal of a visible surface, its surface reflectance 
characteristics, and the lighting, to the intensity value 
recorded at the corresponding point in the imago. 

C. Stereoscopi c Approach 

Gennery et al. t (1981, pp. 6~1 to 6-4) describe variou s 
stereoscopic approaches to finding rang e. They observe that the 
basic stereo approach uses triangulation between two or more 
views from different positions to determine distance. However, 
stereo techniques differ ir. the way in which matching is done 
between pictures, particularly in the kind of entities that are 
matched. Thi two major approaches are area correlation and 
matching lines of maximum intensity changes (edge-ba3ed stereo). 

They report (p. 6-3) that, "Scenes of man-made objects often 
are not highly textured but contain sharp brightness edges at 
boundaries of objects and at intersections of planar faces. For 
such scenes, area correlation does not work very well. Instead, 
it is usually better to detect features in each image and to 
match these features." 

D. Photometric Stereo 

In thi3 approach, the light source illuminating the scene is 
moved to different known locations, and the orientation of the 
surfaces deduced from the resulting intensity variations 
(Woodham, 1979, 1961). 


110 



E. 



Shape from Texture 

Brady ( 1 9 3 1 A , p. 08) reports that, "Of the modules which 
seem to bridge the gap between the Primal Sketch and the Surface 
Orientation Map, none has received quite as much attention from 
Psychologists as the computation of surface orientation and depth 
from texture gradients." Various methods for computing texture 
gradients are possible and from this orientation can be deduced. 

F. Shape from Contour 

Barrow and Tenenbaum [1980] have suggested a method for 
interpreting curved line drawings as three-dimensional surfaces. 
To interpret a two-dimensional curve, a three-dimensional curve 
projecting to it is computed that minimizes a combination of 
variation in curvature and departure from planarity. Other 
approaches to this problem are given by Draper (1981), Katvado 
(1981) and Stevens (1981). 

G . Shape and Velocity from Motion 

Brady ( 1 9 8 1 A , p. 96) provides a review of efforts to recover 
shape from motion for the case of rigid bodies. He reports^ that 
Ullman (1978) was the first to treat this issue. He considered 
the problem of establishing a correspondence between the Primal 
Sketches in two successive image frames. Ullman also studied the 
problem of computing the structure of a rigid body from the 
correspondences of a small number of points in a number of views 
and found that remarkably few of each are required to. compute 
rigid three-dimensional structure. 

Brady ( 1 9 8 1 A , p. 70) defines "optical flow" as the 
distribution of velocities of apparent movement caused by 


i 

i 


in 



- «rC? 


smoothly changing brightness patterns, Horn and Schunck C 1 9 8 'i ) 
have proposed a method for computing ‘‘optical flow” by 
differentiating the brightness distribution in successive images 
with respect to time.” 


112 



APPENDIX F 

HIGHER LEVELS OF REPRESENTATION 

The basic form for the higher levels of representation i3 
the 3-D model. This is an object-centered representation that 
describes the object in a convenient 'way, 33 in the following 
examples . 

A. Volumetric Models 

1. Generalized Cones 

Agin and Bin ford (1973) introduced the concept of 
generalized cones (also called generalized cylinders). A 
generalized cone is defined by a space curve, called the spine or 
axis, and a planar cross section normal to the axis. A "sweeping 

rule" describes how the cross section changes along the axi3. 
Complicated objects can often be represented by a concatenation 
of generalized cones. 

2. Wire Frame Models 

Various investigators have represented 3-D objects by means 
of wire frame models in which the wires correspond to edges or 
boundaries of cross sections. Stick figure models are a related 
representation . 

3. Polyhedral Models 

Wesley et al., (1980) report on a geometric modeling system 
developed at IBM to describe complicated mechanical part3. The 
object is represented by polyhedral primitives which are combined 
as required by the operations of union, difference and 
intersection. In the IBM system, objects and assemblies are 
represented in a graph structure that indicates part-whole 


113 



relationships, attachment, constraint, and assembly. Also 
included are physical properties of objects and positional 
relationships between objects. The system can determine the 
appearance of an object for an arbitrary view. This information 
provides the potential for use by a computer vision recognition 
system to guide tho search for features to match an intake to the 
model , 

4. Combining ID, 2D, and 3D Primitives 

Shapiro et al., (1900) describe objects in terms of the 
primitives: sticks, plates and blobs. Relations are given on how 
the parts connect, their size, and spatial relationships. 

5. Planes and Ellipsoids 

Gennery (1900) produced a method for describing 3-D outdoo 
scenes. The ground surface was approximated by one or mi. re 
planes or paraboloids, and objects lying on tho ground wore 
approximated by ellipsoids. 

6. Sets of Prototype Volumes 

Efforts in Computer Aided Dosign and Computer Aided 
Manufacturing (CAD/CAM) often represent objects by combining a 
small set of prototype volumes such ns spheres, blocks and 
triangular prisms. 

D. Symbolic Descriptions 

The various parts of an object in a scene may be represented 
by graphs in which the nodes are the objects and the 
arcs are the relations (such as above, t.o the right of, behind, 
surrounded by, part of, larger than, etc.) and intrinsic 
a 1 1 r i b u t. c s ( e . g . , small, flat, etc.). 

Barrow and Tenenbaum (1981, p. 876) observe that, "Symbolic 





models are appropriate for natural objects (e.g., trees) that are 
better defined in terms of generic characteristics (e.g., larger, 
green, leafy) than their precise shape.’ 1 
C. Procedural Models 

Rosenfeld (1981, p. 604) defines a procedural model as any 
process that generates or recognizes images. An important class 
of such models are grammatical cr syntactic models. Pratt (1978, 
pp. 574-578) discusses such syntatic processes. He observes that 
syntatic methods have been proved feasible for simple models, but 
notes that it is not clear yet whether or not these techniques 
can be extended to general classes of images. 



APPENDIX G 


HIGHER LEVELS OF INTERPRETATION 

Barrow and tenenbaum (1981, pp. 591-593) outline how 
interpretation night proceed based on intrinsic images. They 
observe that intrinsic images provide scene information on a 
point by point basis in a viewer-centered coordinate frame. 
Higher levels of interpretation, such a3 object recognition, 
require a more global representation in a viewpoint-independent 
coordinate frame. Surfaces and volumes are obvious candidates 
for representaions following from intrinsic images. 

An interpretation-guided segmentation approach based on 
structural prototypes is a possible mechanism for deriving 3-D 
surfaces and volumes from intrinsic images. 

Once a scene description ha3 been obtained in terms of 
surface and volume primitives, geometric models can be used to 
generate similar primitives, which can then be matched by a 
search process to obtain object recognition and location. It is 
often convenient to use graph structures for representing scene 
parts. As scene descriptions are typically fragmented and 
include many objects, some of which may be occluded, it is 
necessary to match parts of the scene graph with parts of object 
graphs. As such subgraph matching can be combinatorially 
explosive, much work has been done on algorithms to handle such 
matching in complex scenes. 

Barrow and Tenenbaum suggest that perhaps the best way to 
defeat the combinatorics of search is to decompose object models 
hierarchically into components. These components can then be 


1 U 



independently matched, and combined nnd checked for conulatnney 
afterward. Using this approach, the complexity of matching tends 
to increase additivoly rather than exponential?.!'* 



APPENDIX H 


TRACKING 

Ginnery ot al. 0981, pp. 5-1 to 5-3) survey the real-time 
tracking problem, observing: 

The goal of object tracking is to process 
sequences of images in real time to describe the motion 
of one or more objects in a scene. Often real time 
implies processing every image from a TV camera 
operating at 30 Hz. In other words, an image is 
digitized, features are extracted from the image, the 
object or objects are located in the image, and 
position and velocity estimates are updated 30 times a 
second, although in practice slightly slower rates are 
sometimes used. At the present, time, the approaches 
which achieve real-time operation rely on simplifying 
assumptions about the nature of the scene, track very 
few objects in a given scene, and incorporate varying 
levels of special-purpose hardware designed for the 
particular tracking algorithm... 

Since successive images arc only 1/30 second apart 

i n tiiiuT, the^ appearance of the object will char.ne very 

I rETTefro m image to The object can be modelled 
ad a pTTv o ly as it was la3t seen by the tracker, with the 
expectation that a good match between the object model 
and the features in the current image is available. 
Furthermore, the location of the object in the image 
can be predicted very accurately by using the latest 
available position and velocity estimates coupled with 
the sho. t elapsed time between images. As a result, 
the search window need only be large enough to contain 
the object up to a few pixels uncertainty. This limits 
the required computation to a manageable level and, 
more importantly, greatly reduces the probability of a 
false match occurring... 

Real-time implementations typically rely on 
features which can be computed directly from the image 
without, resorting to actual 3-D measurements of object 
features . 

Table IX summarizes the various approaches surveyed by 
Gennery et al. It. will be observed that a variety of a pproaches 
are possible using either area correlation or feature matching. 


However, no final optimum system has yet been devised. 



Table IX 


Visual Tracking Approaches 


System Developer 

Purpose 

Approach 

Comments 


rifl'in et al. 
(1978) 

Object tracking for 
closed-loop guidance 
of JPL breadboard 
Mars-Rover vehicle. 

Cray level correlation of a window in current 
successive images of an object. Implemented 
in software. 

Immune to background changes 
if tracking window confined 
to target. 

inkney 

(1973) 

Control of shuttle 
manipulator for 
grasping objects 
tracked. 

Uses a single camera to track four man-made 
markers on object to derive object position 
and orientation relative to manipulator. 



•rooks 

(1980) 

Supervisory control 
of a teleoperator 
manipulator. 

Stereo cameras to track markings on an 
object. 


cm 
" n si • 
*a FS 
a 

O 

[itzen et al. 
(1979) 

Track moving objects 
for feedback to an 
Industrial robot. 

Use SRI vision module. 


& 

T*-v e~r9 
A 

:oaci\ & Apparwal 
(1979) 

Track objects in 
"blocks world" 

Blocks are located and matched (using a 
three level scheme) based on predictions 
from stored internal representations of 
blocks discovered in previous images. 

Works well in blocks 

world; 

f ennen;a & 
Thompson 
(1979) 

Object tracking 

"Gradient Intensity Transform Method" 

Time variations in Intensity and the spatial 
gradient are determined and recorded for each 
pixel in image. A Hough transform method is 
used on the intensity variations and gradientc 
to determine object velocity. 

Requires smoothing (and therefore 
accuracy degradation) for pro- 
cedure to work wall. 


Table IX (cont.) 
Visual Tracking Approaches 


Sysco:': Developer Purpose Approach Comments 

Tsugriv.a et al. Detect position of road Differentiate analog video 3ignal3 from two 
(1979) features to automatically cameras and stereo-match contrast edges, 

guide a car. 


Kir. cinder & Object tracking 

Snyder 
(1980) 


Analog video signal processed by special purpose 
hardware to detect significant contrast areas, 
inside a programmable tracking window. The 
position of the object is considered to be the 
centroid of the extremes of the contrast points. 


This contour-based ap- 
proach easily fooled 
in scenes of moderate 
complexity. 


Gilbert et al. 
( 1980 ) 


»-* 

8 


Real-Time identification Uses four microprocessors as follows: 
and tracking of missiles 

and aircraft. 1. performs histogram analysis of window in 

image to classify pixels as (1) belonging 
to target (0) not belonging to target. 

2. Pums target pixels horizontally and vertically 
to identify target. 


Assigns a confidence 
level to each match and 
relies on prediction 
when natch is poor. 


a c 



3. handles image rotation and camera control. 

4. evaluates goodness of match at each tracking 
iteration and adapts system as needed. 


•© "-J 

C4 Ja 

> Cl 


L” rn 

^3 _ . 

-t S3 


Saund et al. Object tracking 

( 1981 ) 


Use feature-matching to an internal model adjusted Rejects extraneous features 
by a least squares fit. not predicted by model. 


Source: Derived from Gennery et al. (1981, pp. 5-1 to 5-7) 


Real-tlae fcrccSiinc is important for manipulation ; for 
guidance for applications such as recovering satellites or free- 
flying payloads; for grasping coving objects (such as parts in an 
industrial environment); for assembling objects such as 
machinery or electrical appliances; and for building space 
structures. It is also important for target acquisition and 
tracking or for locking onto a feature in situations such as 
planetary flybys and astronomical or earth observations. And, as 
to be expected, it is also applicable for vehicle and missile 


guidance . 


121 



APFE1JDIX 1 


Additional Tablea of 
Hodol-3iiaad Via ion SyntoRa 


122 



123 


Developer: Ballard, Brown and 

Feldman (1978), Unlv, of Rochester 

Purpose: Answering Queries about Images 

Sample Domains: Locating ships at docks 

Locating ribs in chest x-rays 

Approach 

System is structured in levels (similar to VISIONS) . 
-The model 

-Sketchmap relating model to Image 

-Images at different levels of resolution 

Query determines level of detail 

Queries take the form of user-written executive 
programs 

Control Involves synthesis of a sketch 


Table III- a 


Model -Based Vision Systems 


Modeling & Representation Remarks 


Knowledge and model in terms of semantic Uses a fixed and 
networks, consisting of nodes and spatial known viewpoint 
constraints 

User must code an 

Templates are used to describe shape executive match 

procedure for the 

User models objects In Image domain particular task 

(not 31) domain) domain 

Special purpose 
system 




t 

\ 

i 



124 


Bolles (1976) , SRI 


Table III- b 


Developer: 

Purpose: Inspection and Visual Control in 

Repetitive Manufacturing Tasks 

Sample Domains: Location of a mechanical part in 

an automatic assembly work station 

Approach 


Model-Based Vision Systems 

VV (Verification 
Vision System) 

Modeling & Representation Remarks 


Relies on 3D relationships of observables to locate Limited 3D models made up of surface point 
a mechanical part accurately-based on slight deviations features and their locations In 3D space, 
from an expected location and orientation 

1) The user chooses potential operator/feature pairs 


Depends on small 
correlation windows 
as features. Thus 
it Is restricted 
in viewpoint 


2) The system applies the operator to several sample 
pictures and gathers statistical information 
on their effectiveness 


•i 


3) System ranks operators and predicts cost of 
accomplishing task 

4) System applies operators to task in order of 
their cost effectiveness, until desired 
confidence is reached. 

Uses a generalized least squares algorithm and maximal 
clique* finding to match the model to Identified 
features 


o c 
•*n » 

'•<} O 
o 3 
O 
: a 

.c; jtj 

C 7 ? 

Ir, JT * 

pm 


‘Maximal cliques are maximal matches of portions of 
graphs of image and model descriptions 





Developer: Rubin (1978)* CMU 

Purpose: . Identify Objects In Images 

Sample Domains: Labelling buildings In a city scene 


Table III- c 

Model-Based Vision Systems 
ARGOS 


Approach 

Modeling & Representation 

Remarks 

I Hao 3D knowledge of positions of the buildings trans- 
, lated Into adjacency information to guide the search 

Internal model Is a 3D model of the city. 
Model used to generate all possible views 

Viewpoint dependent 

| for labellings of pixels 

Mostly relies on 

Stores multiple representations of buildings. 

adjacency relations 

• ARGOS does not segment - It labels. It works with 

in terms of such things as location, texture, 

pixels or regions. Much depends on spectral labeling. 

color, orientation and gross shape features — 
all gleaned from training examples 

Ret readily genera 11 2 a 

able 

! Search is a very local pixel-based “locus" search, 
i a generalization of HARPY's (network speech under- 
; standing) "beam search" to 2D photo Interpretation. 

i 

t 




i 

i ►— 

l NJ 


. 

_ 

ig ° 

; 


1 3 

i 


a 

1 • . 


O fi 
•;o fa 

t 

! 

' 

rD 

1 

) 


Ss* th 



El M 

r * 

i 

! 

► 

t 

. 

» 

[ 

! 

i 

i 

1 

1 

*'T 

■ 

. 



[ Developer: Garvey (1976) , SRI 

| Purpose: Locate Known Objects in an Image 

Sample Domains: Office environment 

Approach 

System uses simple local features rather than 
structured shape descriptions 

Strategy is to: 

-use windows to acquire image samples which 
might belong to the object 

| -hypothesize the object from the sample 

j „ -validate the hypothesis 

I N5 

i O' 

-erect a boundary around the object 

Coarse to fine strategy: 

-large objects found first, reducing search 
area for spatially-related smaller objects 

Top-down approach 





Programmed Interactively 

Objects are shown to the system by outlining 
them in an image 

Objects are automatically characterized by 
conjunction of histograms of local surface 
attributes such as: hue, orientation, 

range and height, and relationships between 
surfaces 


Performance rests 
strongly on having 
depth data and surface 
orientation derived 
from depth 

Does not use general 
shape information 


sa 

s-J 
«vs 0 

6 -2 . 
o t » 

y" 

a r £ 

75 ’ 

r r*. 



Developer: Barrow and Tenenbaum (1976) f SRI 


Table III- e 


Purpose: Identify known objects in a scene Model-Based Vision Systems 

Sample Domains: Typical objects in a room MSYS 

System also used to drive the IGS 
interpretation-guided segmentation system 


Approach 


Modeling 6 Representation 


Remarks 


An interpretation-guided segmentation system 

Simulated range data is used to determine 3D 
locations, orientations of regions and their 
spatial relations 


Models objects In terms of height, Relies on a particular 

orientation and 3D spatial relationships viewpoint 
between objects 


No shape information 
is used 


Matcher uses spatial relationshipsas a strong 
constraint in matching data from image to model 


fO 


O Q 


-a D 
o *~3 
o % 

**! r~ 


Q -C 

c: > 

r; tn 

^ s 




oper: Kanade ( 1 977 ) » 

se: Find objects in a scene 

e Domains: Outdoor scenes with buildinos viewed Fran eye level 


Table III- f 

Model-Based Vision Systems 


Approach 


Modeling & Representation 


Remarks 


; scene in either intensity or range 


Uses a 2-1 /2D scene domain 


Image oriented 


; to match observed patches against modeled 
les 


Objects are represented as image 
regions 


Viewpoint is only 
pseudo- 5 no'ependent 


Shape and spatial relations describe 
the region 

Objects have multiple representations 
from multiple viewpoints, but these 
must be explicitly described by the 
user 


o a 

*-3 03 


POOS QUALITY 


Nagao (1978, 1980) , Kyoto U 


veloper : 
rpose: 

mple Domains: 


Label areas and objects in aerial photographs 
taken In several spectral bands 

Countryside, Suburban 


Table III- g 

Model -Based Vision Systems 


Approach 


Model ing & Representation 


Remarks 


First do edge-preserving smoothing 

Segment images into regions which are continuous 
in spectra') properties 

Using histograms and adaptive thresholding In each 
spectral band, extract cue regions: 

-large homogeneous areas 

-elongated regions 

-shadow and shadow-making regions 

-vegetation regions 

-water regions 

-high contrast texture regions 

Analyze each cue region by an object detection 
program specific to region type 

Feed summary of properties of regions back 
to subsystems 

System control tries to resolve conflict labels 
and to deal with unlabeled regions. The most 
reliable labels are chosen for a region. If a 
region can't be labelled, system activates a 
split and merge process to correct faulty 
segmentation 


Use shadows to give Information about 
height 

Shadow-making regions are regions adjacent 
to shadows with a long common boundary 

Elongated objects include roads .rivers 
and railroad lines 

Vegetation areas have small ratios of 
red to IR 

Water identified by spectral properties 

High contrast regions are woods and 
residential areas 

Residential regions are those with 
strong gradients in two orthogonal 
directions 

Houses found In candidate residential 
areas by a sequence of house routines, 
starting with rectangular-shaped 
shadow-making regions 


Well -crafted system 
tailored to multi- 
spectral aerial 
photographs 

Segmentation primarily 
dependent on color 

Shadow identification 
not general or reliable 

Interpretation not 
general 

-3D only from 
shadows 

-weak use of shape 
-interpretation 
suitable for large 
areas — noz human 
scale objects for 
which shape is 
important 


o o 
*o si 
o§ 

33 


> « 

E t.j 


3 


CG 


130 


Table III- h 


Developer: Ohta (1980), Kyoto U. 

Purpose: Semantic labelling of regions In color Images Model -Based Vision Systems 

of outdoor scenes 

Sample Domains: Urban scenes with buildings, trees, streets and cars 

as viewed from ground level 

Approach Modeling S Representation Remarks 


Forms regions by splitting using thresholds from histo- 
grams of color parameters. The color parameters chosen 
are 3 algebraic combinations of red, blue and green 
r r*b+q, r-b, 2q-r-b -i 
L 3 ~l 4 

Textured regions determined separately based on the 
Laplaclan exceeding threshold In 3x3 windows. 

A plan Is generated by an Initial bottom-up coarse 
region segmentation. 

A symbolic description of the scene Is made by a top- 
down analysis of the bottom-up Interpretation using a 
production system with knowledge of the world 
represented as a set of rules. 

Decisions mad? by the top-down process cause the 
bottom-up process to be reactivated to reevaluate 
the plan. 

The top-down computation proceeds from a coarse to 
fine analysis. In a scene phase and an object phase. 


Data structure Includes regions, boundaries System does well 
and vertices. overall 


Regions arc represented by: area, arean 
Intensity of r.b.g, degree of texture, 
perimeter length, c.q., number of holes, etc. 

Road model 

-subcbjccts: cars, shadows 
-made of: asphalt, concrete 
-properties 

horizontally long, below horizon 
(car: horizontally long, dark, 

above road) 

Sky model 

-not touching lower edge of tage 

-shining 

-blue or grey 

-touching upper edge 

-linear boundary on lower edge 


One of a few examples 
of reasonable 
performance on scenes 
of moderate complexity 
on a set of somewhat 
different scenes 

Quality of segmentation 
Is weak 

Models are weak 

Approach is Ineffective 
In many situations In 
which fine details 
determine object labels. 


Tree model 


-heavy texture 
-made of leaves 

Rullding model 


-subobjects: windows 

-cade of concrete, tile or brick 

-many holes 

-cany straight lines 

-linear upper boundary with sky 

Rules for the tottom-up plan are unary 
properties of objects end binary relations 
between objects 


The world model is c network of objects, 
materials and concepts (scene schc&as). 




I 


0? POOR QUALITY 


Table III- I 

Model-Based Vision Systems 


ev^loper : 
urpose: 

sample Domain: 


Shirai (1978), ETL, Tokyo 

Recognition and location of common objects from 
light Intensities (grey levels) In an Image. 
Generic desk-top objects 


Approach 


Jses an edge-finding process which extracts edges of 
curved objects. 

Analysis of the scene starts from the most obvious 
object. 

'op-down approach. 

(ecognlzes objects using a hierarchy of features. 

-Find the main feature to get clues for the 
object. 

-Find a secondary feature to verify the main 
feature (and object Identity) arid to determine 
; the range to the object. 

-Determine the range and fi/*d the other Tines 
of the object. 

Ind secondary small objects after large objects 
'e found 


Modeling and Representation 

Describes edges by straight lines or elliptic 
curves 

Objects are modeled and represented by prlmarj 
and secondary features 

Primary Secondary 

Lamp: Contour of lamp pair of vertical line; 
shade corresponding to 

trunk; contour of 
base 

Bookstand: long vertical lines connected 

lines clustered in to nain feature 

a rectangular region 

Small Objects: Shape and details of shape, 5 

size of contour. and light Intensity 

(pipe, pen, etc.) changes J 


Remarks 


Fdge finder adequate 
for task 

Can't deal effectively 
with texture 

Use only i«cge models 
not object Eads! s 

Ho organization of 
related edges 

Tep-dcxr; approach 17, F. 
with only s f&i objects 


a a 


leveloper: Levine ( 1 978), McGill U., Canada 

Purpose: Develop a Modular Computer Vision System to 

Experiment with Different Picture Analysis Strategies 

Sample Domains: Suburban Outdoor Scenes 

Office Scenes 


Table III- j 

Model-Based Vision Systems 


Approach 


Modeling & Representation 


Remarks 


A three level system: 

Low level processing Is approximately in order 

Segmentation weak- 


of decreasing size using a pyramidal data 

based on s gradient 

First level segments pictures into regions without 

structure 

operator 

scene context. 

Uses feature vectors throughout 

System is viewpoint 
dependent 

Second level has tv:o phases: 


Local phase matches all stored image templates 
(of feature vectors) with observed regions, 
using A* graph search 

Features are stored in three classes accord- 
ing to decreased Importance in reducing 
search time. 

Global optimization phase uses dynamic pro- 
J gramming to merge regions and assign labels to 

1. Includes minimum bounding rectangle 
and its areas. 

o o 

them based on model-driven spatial relationships 

2. Intrinsic features: intensity, hue, 

* ; n K3 


saturation and Its area. 


Highest level Includes management-type relational 
data base of image-oriented scenes, and a data- 
driven production system of what actions to 
take depending on what appears in the image 
representation In short term memory (STM). 

3. Includes six moment invariants as 
a rough measure of shape, and 
detailed shape from a set of 
Fourier coefficients for the 
outline (used only in final 

O 

9 5~ 
22 r 

c:; 

•Z 

5TK contain; list of region; and a confidence-ordered 
list of their interpretations. (It resembles the 

template evaluation). 

~X‘ui 

blackboard of HEARSAY.) Implicit actions are invoked 
when a region matches an object in a scene model with 



a confidence level above threshold, 

! 

| 

j 

i 

1 

i 




i 


Developer: Herman, Karade and Kuroe (1582), CY.U 

Purpose: To incrementally acquire a J-D model of a complex 

urban ecene from images. 

Sample Domains: Near vertical views of Federal Buildings area of Wash., D.C. 

Approach Modal ing & Representation 

Uses stereo analysis to construct partial wire frame Constructs wire frame models from images. Domain knowledge 

models from scene vertices and edges (which have been used is viewpoint 

previously extracted from t be Images using conven- Uses a structure-graph representation to dependent, 

tlonal edge point finding, thinning, and fitting of model surfaces In the scene as polyhedra. 

straight lines). Modes represent primitive topological elc- Can generate J-D 

mento (faces, edges, vertices, objects, and perspective views 
Constructs a structure -graph to represent partial edge groups) or primitive geometric elements of reconstructed 

constraints on 3-D structure. (planes, lines and points). Two types of buildings from any 

links are used - part-of link (relation desired viewpoint. 

From the wire frame descriptions, a surface -based between two topological nodes) and the geo™ 

model representing an approximation to the scene in metric constraint link (representing the Urea truth raintea- 

; generated using dona in -specific knowledge of building constraint relation between a geometric and ar.ee procedures 

sin pcs - ouch as flat roofs and vortical sides. topological node). when hypotheses c.cbd 

ncdificaticn. 

Use 3 a relaxation type process to merge wire franc Genera tea a surface -ha. cod J-D scene model 

models generated from different stereo pairs. fren tho wire frame descriptions utilising 

hsuristica of building shapes. 

Modifications, additions or deletions to ths struc- 
ture-graph model are made as new information ia 
found. 


Table III- k 

Model -Based Vision Systems 
3-D Mosaic Project 


Remarks 


j 

f 

i 

\ 

! 

: 

i 


ORIGINAL PASS m 
OF POOR QUALITY 


leveloper: Faugeras and Price (1980), USC 

'urpose: Semantic Description of Aerial Images 

Sample Domains: Search for known objects In an aerial image 


Table III-l 

Hodel -Based Vision Systems 


Approach 


Modeling and Representation 


Remarks 


lonstruct a network of inage segments using reglon- 
>ased image segmentation and linear feature extraction.) 

Solution Is by stochastic matching (relaxation) of the 
image network to a portion of the model network. 

The approach takes the form of a constrained optiraiza- 
:1on graph search, (lames are assigned to units with 
j-atchlng probabilities above 8QX, 


Represent image segments by properties: 
-average color 
-simple texture measure 
-position 
-orientation 
-simple shape measures 

Relations between image segments include ad- 
jacency, proximity and relative position 


Uses Image-dependent 
models restricted in 
viewpoint 

Segmentation is 
relatively weak. 


LU 


^cdel Is described by a semantic network, 
with the nodes being segments projected Into 
the image plana. The arcs are positional 
relationshios. 

Image is also represented by the seme type 
of semantic network as model . 


o p 

r* 

r> *o 

C T-> 

9 

tr rn 

a 

rd 


APFSHDIX J 


Tablos of 

Conaorc tally Available Via ion Syoto 


136 


Optomation II This is a sophisticated high speed vision system designed primarily for 
Approx. $50K inspection and measurement. (The system can do measurements alonq user 
defined lines.) 

Optcmation uses up to 4 G.E. solid state CIO cameras for input. The system 
is based on a multi-distributed microprocessor architecture partitioned to 
take maximum advantage of hardware, firmware and software modules to achieve 
high speed, flexibility and low cost. 

The system first thresholds the image to binary. The thresholded images ere 
in a 256x256x3 bit high speed dynamic random access memory, normally 
operated as four 256x256 2 bit pages. 

The thresholded image Is next "windowed" to establish spatial limits for 
data to be further processed. The windows are raster-scanned to find edge- 
points (where pixels change from 0 to 1 or vice versa). Using a patented 
corner point encoder the system observes where edges change direction - 
(45° or greater) and labels these‘'cornerpoints . " Only the corner points 
are stored. This is all accomplished in a single pass. 

The stored corner prints are then correctly associated such that each ciject or 
item (closed edge set) Is reconstructed end stored in an item file. The • 
feature extracter then analyzes these item files and extracts key features 
such as an are3, centroid, bounding rectangle* distances, angles, etc. 
(similar to the SRI module). 

Approximately 50,000 corner points per second can be processed. The process- 
or can simultaneously operate on4 scenes, composed of 64 objects with up ? : o 
3300 corner points. The system can thus handle up to 15 images per second 
for each of 4 asynchronous cameras. 

Optomation II can be readily programmed by the user in Basic-like VPL 
(Vision Planning Language). 



Recognition 






Company 


System 


Commercially Available Industrial Vision Systems 
Table VI (continued) 


Comments 


a 

u 


(Z 

o 

•*"- 

I ->-> 
o 

CJ 

a. 

i/> 

c 


c 

o 


ez 

ci 

o 

u 

a 

x 


c 

o 


c 

cl 

X 


Machine 

Intelligence 

Corp, 

Sunnyvale, CA 


VS 100 

Vision System 
Approx. -$35K 


VS 110 

Vision System 
UN n IS ION 


D3-100 

Development 

System 

Approx. $95K 


This sophisticated system is based on the SRI Vision Modulo System and is 
easily programmed using a light-pen controlled menu on TV monitor. Can 
also be programmed in BASIC on a DS-100 Development System. 

VS100P, aportable version, is also available. 

This Byatem adds a Programmable Image Overlay-feature to the VS100, so that 
by masking, or differencing, a precisely located port can bo inspected for 
flavr3 or tolerance verification. 

This system is basically the VS100 used as a pattern recognition system to 
provide a vision sensing capability for U1IDUTE FJHA robots. It consists 
of a vision processor, graphic display monitor, light pen, caster a ( 3 ) .aid 
UHIMATICN's YAL software and hardware interface. The system is designed 
to operate in real time . UH 17 IS ION can be trained to recognize a maximum | 
of 9 different objects with up to 12 non-occludcd parts in the scone at or.a J 
time. For each part, 13 distinguishing features can be generated including \ 
area, perimeter, C.G., number of holes, and marinua and uinimua radii. ; 

i 

Allows easy vision program development by eng moors with little computer 
experience. Shields the user from most computer-related data 0.3 while 
providing sophisticated dovolopcant tools, such as 20 megabytes of disk, 
a file system, covoral a ere on -oriented editors, a compiler, and several 
debugging aids. Programs developed on the D3-100 can be executed in the 
factory cn a VS-100 or YS-110. Price includes an integral vision system, 
training, end uaor-oupport documentation. 


. ij 

, t* '("> 
Q ~3 

o 

\yj 

|-g3 

/ » G> 

S m 

Ns 






Company 


System 


Commercially Available Industrial Vision Systems 
Table VI (continued) 


Comments 


K 

O 

•»- 

O 

CJ 

a. 


c. 

O 

u 

a 

ct 


c 

a 


Cl 


Autorat ix, Inc. 
Burlington, Mass 


u> 

03 


Autovislon II 
Approx, $35K 


This sophisticated system tea cany of the aspects of the SRI Vision Module .] X 
It can also window images for template catching or feature extraction. 

System has frame buffer storage and can handle sixteen gray levels. 

System Is user programmable In a customised high level robotics system 
language called RAIL. 

The system is either obtainable os a stand-alcac , or is available aa an 
option to Automat lx robots for asoocbly and arc welding (for coaa-tracltlng 
guidance utilizing structured light). 


o o 

•n p) 

•n cc 
O -y 
2 35 

50 r" 
(O 

C 35 
3* O 
r*J 

3*3 





r 

t 


Commercially Available Industrial Vicicn Systems 
Table VI (continued) 


Company 


System 


Comments 


Octek, Inc. Robot Vision 
Burlington, Module 2200 
Mass . 

Approx. $9«9K 
(without 
camera or 
primary 
computer) 


20/20 Vision 

Development 

System 

Approx. $45K 


'.This is a sophisticated computer system designed to be interfaced to a DEC 
or Data General computer. 50 FORTRAN S/W programs come with it to provide 
the capability of the SRI vision Module, Can inspect up to 5 parts/sec. using 
a modified SRI algorithm and feature vectors (having components such as area, 
moments, etc.). Can handle up to 50 objects in a scene at once. 

System incorporates a frame-grabber which can handle 4 images at once. System 
can also do signal averaging and "kemal manipulation" (spatial filtering, 
template matching or image subtraction). 

System can do histograms, measure objects in terms of length, width and angle. 
Can also do psuedo color with gray scale. 

Octek is now supplying a HITACHI 320x240 resolution solid-state miniature 
4-bit gray level camera. Octek also supplies CCD and other solid-state 
[cameras, as well as monochrome and RGB monitors. 


in integrated self-contained package containing a camera, Image Processor, 
ll/23 computer, B & U monitor, printer and cabinet, plus FORTRAN subroutines 
for users to implement, their own applications. 


Recognition 




Commercially Available Industrial Vision Systems 
Table VI (continued) 

C 

c 

o 

u 

C 

c 

c 

o 

c 

o 

a 

i— 




*£ 

u 

<D 

Q. 

& 

O 

Z3 

cx 

Company 

System 

Comments 

u 

Q) 

>• 

t/1 

c 

M 

u 

<u 

O' 

c 

« 

31 

: Object 
Recognition 
Systems 

; New York, N.Y. 


Computer-based vision systems that uses pattern recognition techniques for 
high-speed verification, packaged for Indus trial uso. 

The standard products have "on-the-fly" hardware (no frame-grabber). Systems 
sample and filter with analog signals. Using firmware generated windows, 
can zoom in on particular areas. General approach is to extract gray scale 
samples, then extract features and compare with stored patterns. Also 
available is a picture differencing algorithm (with averaqe oferevious frames 
for change detection. Systems onploy gray scale, edge detection, and tex- 
ture information, as appropriate. 

X 

X 

X 

X 

I 

6. 

r ^ 
? o 

System Q 
Approx. $20K 

For alpha-numeric legibility verification. Hoes edge of characters detc-ctior 
(proprietary) on fly. Hatches against stored prototyoe (constructed of 
samples) . 

X 



O o 

ScanSystem 100 
Approx. $20K 

Real time vision system for verification and Inspection - cinglo pattern 
library - 300 iragas per minuta. 

V 

n 

X 


"if fj> 

o 


ScanSystem 2u0 
Approx. $25K 

Real time visicr. system for verification, inspection and recognition 5 
l60 Pattern library - 300 iaageu per ninute. 

w 

A 

V 

A 


o g 

;rr- <v 4 i 

‘ 

s 

; 

' 

• 

ScanSystem 

1000 

Approx. $65K 

Designed for keyboard verification. Train it with a good keyboard under 
joystick control to bring each key Into view of an area-type sea-- er. 

Forms a file of table addresses, windows and associated features. Ca. inspect 
keyboards at rate of one/minute* Can also inspect populated 

PCB boards for correctness of component placement. Checks for color (via gray 
scale) and height and width of characters. Checks for maximum correlation 
and extracts transform -coded edgo features and matches then using a pattern 
distance ceasuro. Also generates statistical quality control inform.it ion. 

X 

X 



i 

! 

r 

it 

i-bot 

Vision System 
Approx. $30K 

Vision syston designed specifically to assist robots to recovo individual 
objects fron a jusblcd bin of parts. Module can guido ths pickup of jumbled 
cylindrical and spherical shaped objects from a bin , using a codification 
of tho U. of Rhodo Island peak reflection technique. Havolopsant is under- 
way on bin-picking for a greater variety of chapes, A 3~D vision syotca 
using photo-cetxlc stereo is also under development, 

♦ 


X 

X 

■ 

X 




Company 


pectrcn Engr . Inc. 
nver, CO 


System 


Commercially Available Industrial Vision Systems 
Table VI (continued) 


Comments 


CE 400/410 

Hire, Optical 

Fiber 

Diameter 

Measurement 

System 

Approx. $1 OK 

CE 400/410 
Print Photo- 
copier 
Scanning 
Hicrodensi- • 
tometer 


Looks for print sharpness and uniformity in evaluating photocopiers. 

Can also evaluate paperstock. Differences adjacent elements in letters to 
obtain a mean square difference in reflectivity. 


c 

o 

•i~ c 

4-> O 

CJ *r m 

O 


Used primarily for dimensional measurement and defect detection and evalua- 
tion. 

Detector based on nhoto diode arrays. Uses cameras, processors and 
controllers of their own design. Front end optics, light sources and soft- 
ware tend to be aoplication specific. Their strength is in high resolution 
appl ications . 

Have a library of subroutines to draw upon in devising custom applications. 

Uses a flash source to overcome vibration. Measurements at 30 to 60/sec. 
Runs automatically. Can be customized for process control. 




<0 nj 
C 

> O 

C t’l 


Customi zed 
Dimensional 
Measurement 
Systems 
Approx. S20K 
to $3 5 K 


First digitally correctsdata for distortion, photodiode variations etc. 
using 256 gray levels. Uses image reconstruction techniques to enhance 
resolution. 


X X 


I 





mpany 


jozatrix 

isworth. 


System 


Commercially Available Industrial Vision Systems 
Table VI (continued) 


Comments 


CA 


VPU 

(Video 

Processor 

Unit) 

Approx. S15K 

Approx. $20K 
with camera 
and moni tor 

VK-2000 

Measurement 

System 

Approx. $?OK 
for a com- 
pletely auto- 
mated 

measurement 

station 


Converts a video analog TV camera output to a binary 500x400 image. Basic 
capability of Vision Module is programmed in firmware (PROM), enabling 
the system to calculate featuron such as c.g., area, edsoa, dtaaoter, 
etc., in 50 milliseconds. System can operate in a stand-alone mode or 
interface to a computer. 


System can window and automatically focus in on a surface point so that it 
can measure to 0.0001” in a 4" cubic region. Using a 40 power micro- 
scope, measurements to 7 millionths of an inch are attainable. 


o o 

•fi 33 

"O o 
O 2 
o > 
so p 

lO -o 

d 53 

> a 

r r it 

^ 53 


4-y 

O 

<j 

n. 

i/) 

c 






'V:? 

i 


Recognition 






Company 


■rweld 

I C 5 

. Michigan 


it Vision 
items 

vllle, NY 


System 

Commercially Available Industrial Vision Systems 

Table VI (continued) j 

Comments 

Verification 

! Inspection 

1 

Recognition 

Manipulation 

Opto-Sense 

Uses a multiple windowing technique, setting up subsets of rectangles around 

X 

1 



Approx $40K 

portions of interest such as holes. Sets threshold limits and does area 


1 


j 

and up 

counts above or below threshold in a window to see if portion is within 


m 


i 


tolerance. Requires part be properly oriented. 


m 



[ 

f 

Can be upgraded to incorporate SRI Vision Module features 

D 

H 

X 



Can be further upgraded with a full-frame grabber (up to 256 levels of 

l 

1 x 

A 

i 

V : 

! 

[ X 


gray) and customized software added, for more sophisticated applications. , 





Primarily 

A company strong point Is proprlatary techniques in the use of structured 

X 

X 

< 

X j 

x 

Custom Vi s' on 

liqht and triangulation to discern the 3D coordinates of an object under 



t 


Systems for 

view. This volumetric vision ("solid photography") approach can capture 



1 


Mill tary and 

[ an enormous amount of data very quickly (all the data required to define 





Industry 

a man's head In 0.9 seconds). 



i 



They also utilize area type sensors for robot vision. 



I 

1 O Q 
*n pu 


Have made a sensor system for Cummins Diesel to measure very large un- 




'v a 


fixtured engine block castings. The systen makes 1250 measurements in 



i 

C. -r. 

o 


35 minutes and compares dimensions with those stored In a computer. 






The system is designed to achieve accuracy of 0.0001". 




O TJ 
r~ >•. 


Now building a robotic welding system for use with a Millicron T3 robot 




j- 7»«* 

> a 

j 

to weld automobile frames. This is a two part system where first the 




C i-J 
—I „ 


weld line is scanned at 180"/sec. to determine seam locations and width. 




c. 


Using this information, the seam is then adaptively welded. 





! 

Now making a standard vision system with a 1" or 4" field of view with an 



| 


1 

• 

accuracy of 0.001" for Inspection applications, but which could also be 





! 

1 

. 

configured for welding. 

| 





. 

| 

1 

1 

| 

J 


l 

i 

; 






771 


cmpany 


jgr*ex 
>3 ton, 


System 


Commercially Available Industrial Vision Systems 
Table VI (continued) 


Comments 


+•> 

v 

Cl 

a. 

« 

c 


CTl 

O 

u 

<u 


fia6c. 


Dataman 

Approx. $25K 
for basic OCR 
system. 


System uses a DEC PDP-11 -23 and other off-the-shelf hardware such as cameras 
Dataman derived from research Dr. Shillman did at M.I.T. on how humans 
recognize patterns. Implemented resulting proprietary algorithms can read 
badly degraded alpha-numerics. System can read virtually any alpha-numerics 
humans can (will reject unreadables rather than make errors). 

Basic system is for optical code reading (font specific). System can also be 
used for print quality assurance (legibility) and quality control. 


O O 

as 

*0 ffJ 

O % 
73 f 

<0 -r, 
a > 
> o 

il i-i 


• i 


Manipulation, 




■mpany 


Industries 
odonia, CH 


System 


Commercially Available Industrial Vision Systems 
Table VI (continued.) 

Comments 


a 

u 


S- 

CJ 


c 

o 

*1“ 

+J 

o 

o 

n. 

in 

c 


c 

o 


c 

o 

u 

CJ 

I cs 


Han Scan 
3000 

Approx. $6K 
(without 
camera and 
moni tor) 


Haa Scan 
1000 

Approx. $4K 


Ham Scan 
2000 


This is a verification and inspection system that operates on an analog 
image either by template matching or by analog integration. 

Can do windowing, or measurements Gn a single programmed line. Windowing, 
thresholding, line placements, etc. is set using factory customized software 
in the integral micro-processor. 

System can be trained by showing, or by manual use of switches. 

Does template catching with adjustable allowable deviations us ing gray 3cale 
and a single window. 


Similar to the 1000, except that it has a double window. 


C 

o 


ea 


=3 

CL 


c 

cn 


O C 

T 


O yr* 

O * 

to r - 


i-i 


'i ... 
-4 w : i 


‘ 


■ i 

; s 

: -i 


\ 


3 

: 4 

if 

i 





APPENDIX £ 


GLOSSARY 


146 



GLOSSARY * 


« Artificial Intolliconae (Al) Approach! An approach that las its esjkaeia 
on symbolic prosaaaas for repsv<<&cntins ancl lianipulnting knowledge 
in a probloa solving zoSc, 

• Bar Opsratorai Convolution aaaka to do tact ooccnd derivatives of Szaga 

brightness In particular directions. 

• Binary Inagoi A black and white irego rss presented oa cares rxd cuoa, in 

which objects appear as ailheuottos. 

• Blackboard Approach! A problon eolving approach wharoby ihs various 

oystoa olocoata ccmunioate with each other via a coassn corking 
data storage called the blackboard. 

«, Blobi A connected region da &■ Maary icaro. 

» Bloc he I’crldt Scenes consisting of throe dinansicsnl polyhedral objaot 
configurations. A aiaplo artificial world used to explore ccapatar 
vision concepts. 

• Botton Up (Bata Driven) ! Iiofcrs to the esqusniial processing by a 

vision eysten, beginning with the inpat irsg® and terminating in an 
interpretation. 

• CAD/CAH j Ccaputor-aided design / coaputrr-aidod Banuffecturo . 

• Chain Ccdst A boundary representation which otrxto with an initial 

point and stares a chain of directions to successive points. 

a Coaputer Vision (Coaputational cr Hachina Vieira) t Perception by a eon- 
putor, based on visual consory inputs in which a concise description 
is developed of a scene depicted in an irage. It is a kncruledgo- 
bacod, expectation -guided jxocasa that uses scdals to interpret 
consory data. Used somewhat eyncayEously with toago understanding 
and see no analysis. 

• . Concurvat A boundary rspraoentatien c can luting cf a chain of straight 
lines and arcs. 

• Convolve i Sup;r in posing a nxn opera tear over a nxn pixel area (window) 

in the losgo, sultlplying ccrrospcnding points together and suuaing 
tho rasult. 

• Caraarx An abrupt change in direction cf a curve. 

• Correlation! A correspondence between attributes in on lease and a 

reference inage. 


* As yat no standard definitions exist, so that the definitions listed hare 
can be. considered to b» a ore what ic pro cine. 



Skmcriptioae A oyabolie iwjeesaatafcicei c4* the* relevant SnSenritica, 
e*s*> a liot of aiaiioticsl features cf a rcsiess. 

Digitised Isago: A rejrro&satiit iea of ac ir^gs* as an array cf bright- 
ness values. 

Docaiat Tho sphere of ecr.ee-m. The task world e A sst of allowable 
inputs. 

Edges A change in pixel rsluos (exceeding aoeo threshold) botvaen two 
regions of relatively unifora valr>03« Judges c error. pond to change a 
in brightness which can correspond to «. discontinuity in surface- 
orientation, surface raflcctanco cr ilXuainaticn. 

Edge Oporatoroi Templates for finding edgos la logos. 

. Edgo-Encod Stereo i A atoroegraphia teehniejua eared on catching edges 

in two or core views of the mm (icons tuksn fros different pcaitieas. 

Features! Slnplo irnge data attributes such so pixol amplitudes, edga 

point locations and textural descriptors* or cosswhst core ala tors to 
iffiaso patterns cuch as boundaries and regions. 

Feature Vcctort A cat of features of an object (cush ao area, nuabsr cf 
holes, otc.) that era to used for its identification. 

Fteaturo Extractions Daiernining irago feature a by applying featura 
detoctcro. 

Gaussian Filterings A cenvolution procesdure in which tho weighting cf 
pixels in tin tsnplnta fall off with distance according to a 
Gaussian distribution. 

General Purpose Vision System A vision syjton that is universally 

applicable. A oyotca that is based on generic rather than specific 
Imowledga (cf. Ifovatia, 1932, p 1£8) > \ ays ton that can deal with 

unfanillar cr enoxpcctsd input. 

Cenoraliced Cone (Generalised Cylinder)! A volunotric eodol defined by 
a space curve, called the npine ox axis, end a planar cross cectlcn 
noml to tho axis. A "sweeping rule" describes hero the cXccs 
section changes along tho axis. 

Generalised Ribbon (Sea Skeleton Representation)! A planar region approx- 
lnatod by a oedial lire (axis) and the psrpondiculor distances to \ 
tho boundary. Tho 2-D version of a generalised cone. 

Global Method! A nethod based on non-local aspects, o.g., regies split- 
ting by thresholding based on an inngc hiotogran. 


Goal Driven! Top-doom approach 



i-,., 

0- Pftrtr* « 

* w ’3£! Qy** ' r *>»^ 


f Gradient Sp.ee s A cecrdix&io systca (p 0 a) in which p end q era the •• 
rates of change in (asay vs&ita, ac? tto curi&ca of an object 

in th-i scene alc&s - 4ha s aad y dircciicaa (the cocrdiaatca in tha 
Snago plena). Thus (p»q»l) hao iita direct ton of the surface 
noraal. 


Gradient Vocter* The orientation rxd ratgaitwlo of the rats’ of change in 
intensity at a pint in tbs inaga . 


• Grap’) (Alco IkriAiics&l Graph) i An Snags rojraeentaiica in which nsdos 

represent regions c-ci area between ricd.cc represent properties of 
and rslaticas bsttroca thoso regicaa* 

o Gray level r A quasticod • asaataotsnt of insag® irradiacca (brightness)* cr 
other pii:ol property. 

• Ifetorarchical Approach* An iiaga interprets.* tea control stmaturo in 

which no prococsing stage is in sole ceszand, but in which each 
a toco can control other eingoa to ito cac-Ga as required. 


« Heuristics* '•Rules of thunb c ° Imewledga cr ether tcchaiqusa used to help 
Guide a problan solution. 


• Hierarchical Approach* An approach to vicicr* bsesA cn a series of 

ordered processing levels in which the ds®eco of abatrsatioa in- 
crcacoa an wa proceed frea tlto irago leva! to the late-rpotatica 
level. 

« Higher levels* The interpretative presses lng stages such so tboira in- 
volving object recognition and scans description, an opposed to tbs 
low-sr levels ccrreopcnding to th» icaga and descriptive stages. 


• Histogram Frequency counts of tho occurrence of each intensity (gray 
level) in an inege. 


• Hough Transform A global parallel csihod for finding straight or 

curved linoa c ’in which all points cn a particular curve nap into 
a a ingle location in tho tr&nsfora spaca, 

; Huockol Operator * A siothod for finding edges in an icaga by fitting an 
intensity surface to the neighborhood of C3ch pixel and ce looting 
surface gradients afcovo & chosen threshold value. 

s Iconics lEago-liko. 

» Incgo t A projection of a econo into a piano. Usually represented as an 
array of brightness values, 

• Icage Processing: Tmnsf creation of an input inage into an output irago 

with sore desirable proportion, ouch as increased Rharpnocn, Ions 
noiso, and reduced go one trie distortion. Signal processing is a 
i-D dialog. 


149 





G??t GSMAS, 

CF 


rl IS 




o Znsgo ttodcrstendiag (H))» EapX©y* 5 ' awsaotyiu pcsoling twid tl® AX 

techniques cf fcactdcdgs ropJECscntatlen and cosnitlvo procassiag 
.••to develop seen© interpretations Srea irjn(p data. IU haa dealt 
extensively with 3D objects. IU usually operates not c-s an 
irago but. on a symbolic » 02 KV<i*afcaii<ra cf It. XU is conawhat 
BynenysouB with eoaputor vision and ticor.a analysis. 

. Im.die.Ecei The trichiases of a point in the. occne.. 

• IcoEorphic P.ejsocsctationj A ropressatatlc-a in which thare* is. a cao 

to one ccrrespsn&one© between tha scene and its roprcaontstloa, 
(o.c.» an i-mga cr a cap). 

• Interpretation : Establishing a CGrrsapcndencc between the cccne and a 

sot of Ecdals. Assigning canas to objects in a scans . 

. Interpretation -Guided Ssssr3Rtatica i Using ECdola to help guide iraga 
eoenantatloa. by tlio proooua of aatcsidiisg partial catches. 

• Intrinsic Giiaractaristicas Ecopsjftion interont to the object, each 

as surfaoo rafloetanco , orientation, incident Ulus lnat ion ned 
ranee. 

o Intrinsic Tenges i A est ol* arrays. in rsgiiiisatien with th» img® 
array, Each array corresponds to a particular intrinsic 
characteristic . 

« Laplacisn Operator: Tho erna of tha oacond derivatives of the ira.ge 
intensity in the x and y directions is called tho Lapis cion. 

Tho In lilac ion operator is used to find edge clorasto by finding 
points where the Inplacian in csro. 

• Lino i A thin connected sat cf points contrasting with neighbors on 

both sides. Lins representations are extracted fro a edges, 

• Lino Detectors! Oriented operators for finding lines in an irsgo. 

• Line Followers » Techniques fer oxtending linos currently taing 

tracked. 

• Low Lovol Features! Pixel-based features ouch as torture, res ions. 

edges, lines, corn era, etc. 

. Hodol -based Vision Syctoai A 3yotaa that utilises a priori nodols . 

to dsriva a desired description of the original ccena £Tcw cm insgo. 

. flodulet A procGcsing unit in a vlnion syctoa. 

. Monocular! Pertaining to an tenge token froa a single viewpoint. 


150 



OfllGISVAL PAGE 
0F POOJ* qualstv 

Optical Flews Th o dlst?l*cttti%u cf va'lcoitios of oggsscat Esvesent 
la or. Asagy caused ’ey ss&atbly cisaglBg teightasss gatt-$?r.s« 

Fitters Reoegniilcnt A technique -ilst?. edarraifics izagsa into jps- 
dot-sn-alncd estegesrioa, usually noil*? otaticfcicsl cathode. 

Rsreoptloas An astivo procsKs itt which fcypottoasa aro fcrssd about iha 
nature of tha envlreaaait e es ssnsery lufcasat i«s to sa^t to 
coafira or refute hypsih-s soo. 

Fhotoratric Stereo s An approach in which thy light scares illtta- 

inatias the econo As isovod to di.ffo.reat fa iwra legations, end tha 
orientation of the awrfoceo deduced frea the royalties intensity 
variations* 


• Pixel (Picture Eloaaat )t Ten individual decants in a digitised 

if-ago array. 

V 

• Brics.1 Shotchs A prialtivn ceacrlpi-ica of the intaralty ehasaso 

ir. an iraga. It can to ra^roeenied by a net ef chart lino 
cogconta copnmting regions ef different brightnesses* 

a jyranids A hierarchical data ntraatura that rajacscaio an froge at 
Govorol levels of resolution aiEulicraseuaXy* 

• Quadtree c A roprcccntstiea obtained by resuroivoSy eplitiiag cun 

irago into quadrant's, until all pixels ir. a quadrant ora uaifera 
with respoct to r.ors feature (such an gray level). 

« Recognition! A catch batvusn a description derived free cn Inags and 
a description obtained froa a stored csdoi. 

• Reflectance (Albedo) i The ratio of total refloated to total incident 

illuniration at cash paint. 

a Rcgioni A oat of connected pixels that chew a cosson property euoh 
go overage gray level, color or texture f in an icags. 

. Region Growing! Broco&s of initially partitioning on irxgo into 

oloncntary regionn with a c canon property (such as gray level) 
and then nuccos3ivoly narging adjacent regions having sufficiently 
arall differences in the calcctad preporty, isitil only regions 
with large differences batwoan than retain. 

• Registration! Processing lngeo to correct gsosetrical oui intensity 

diotnrtlona, relative translational and rotational shifts, and 
cagnificatlon differoncec batmen one inago and oacthor cr 
between an iungo and a roforoneo rap. Uhon registered, there la 
a ons to cno correspondence botmoa a cot cf points in the inage 
and in tho reference. 

• Relaxation Approach! An iterative problen solving approach in which 

initial conditions aro propagated utilising constraints until ell 
goal conditions aro adequately oatluifled. 





Eolnticnal Graph t Ssa "graph, w 


0 ? 


. v 

vC«» «■ e 

V-r> M *. ft 7 


t Eoproosntaticaj A symbolic description er csdal of objaeta ?ji tfco 
irage cr seono doaaia, 

• Run-length Encoding! A data eespracaiea toefcniquo in which an inaga 

Is rootsr-esanaad and only iha lengths .of russ of connocufcivo 
pixels with the csss property arc stored* 

a Scone i Tha j-D envireaesnt fvoa which tho icago ia generated . 

• Scoue AxsalysJn! The process of caching isfcn-atlca about a 3 - D 

ecs no Cto~ information derived fren a 2-JJ iuags. It usually 
involvon tha transformation of sinplo f&atutsg into abstract 
deecriptiona. 

• S3gscnts.t5.ont The prococs cf breaking up ca Inc&s into rsgSasj (osch 

with unifora attributes) usually corresponding to stssfacos af 
objecia or entities in tho scans „ 

« Sanantie Interpretation t Producing an appli^atica-depsnisnt gsccs 
. doBcripticn iron a feature cot {roprecontatiea) derived fren 
tha inage. 

e Sscantle networks A ro presentation of objects and relationships 

between objects an a graph structure cf nodes ssd labeled ores. 
Sso “graph •“ 

» Skeleton Representation (Sec "GonnrallcGd Ribbons**) i A representation 
of n 2-D region by the radial line and tha perpendicular distance 
to the boundary at each point alcrg it. 

a SIcotch Ks.pl A rough lino drawing of a scans . 

• Sobal Operators A popular convolution operator for detecting edges. 

Sin liar to other difforcacs operators such as the Rrcuitt • ■■ 
Operator. 

a Spectral Analysis i Interpreting isage points in terra of their 
response to various light frequencies (colorc). 

a Splines (B-Splinoc) t Piocmrinc continuous polynomial curves used to 
approximate a curve. 

. Storosccplc Approach! Use of trlangulation between two cr nexo views, 
obtained fros different positions, to do torn in o ranee or depth. 

a Structured Light t Sheets of light and other projective light con- 
figurations mod to directly determine shape and/or range frea 
the observed configuration that tha projected lino, clrclo, 
grid, etc. cakes an it intersects the object. 


152 



QRUsiNAL PAG -r ! r f 
OF POOR QUALITY 

a Synbolie Baser iptioas Uca-icoale sceras dceeelpticso such as graph 
ropccssataticaa. 

6 Syntactic Analysis* Recognising Images by a "parsing" proeeca as 
bains built up of primitive oicacata. 

© Tcnplates A prototype icoais ucdol that can bo used directly to 
catch to inrgo efenraeiaristics i'tss object rocogaitica or 
lnspsctica. 

• Tonpl&to fetching* Coro Hating &a object tceplatc. tilth an observed 

lingo field - usually parfccr.£d at the pisal level. 

o Torture* A local variation in pixel values that rojssate in a regular 
cr randon tay across a portion of an iusgo cr object, 

• Thresholding* So pern, tins regions of an isag® based ca pixol valuso 

above cr balou a chosen (threshold) vales. 

• Top Detin Approach (Gcal Directed)* An approach in vhieh the inter- 

pretation otago la guided in its analysis by trial or test 
descriptions of a eccno. Sorstires referred to as "Kypoifcssiso 
and Test." 

e Tracking t Broooening esquoaesa of Surges In real ties to derive a 
description of ths notion of can ec goto objects in a sso&o. 

« Vortex * Tho point on a polyhedron coarion to thrao cr csro sides. 

• Viewpoint* Tho position (or direction) fi-oa uhich ths econo is observed. 

• Yin ion i Tho procaoa of underotending ths onviroaesat based cn lingo data. 

• biro franc Pedal* A 3"D nodal, oinilsr to & uirefroro, In srhich tho 

objoct is dofinsd in tors of edgea and vertices. 

• Ulndow* . A selected portion (usually equaro or rectangular) of an isags. 

• 2-Di Two dinonsioaal. 

• 2.5-D Sketch* A econo roprsBontatioa proposed by Karr (1573), consisting 

of surface distances and orientations. 

• 3-D* Three dimnoion&l. 


153 




APPENDIX L 


SOME PUBLICATION SOURCES FOR TUTUER INFORMATION 


154 



ORIGINAL PAGE ?2J 
OF POOR QUALiTY 


*.*A w«W J—i. i-j 


SOSS HJBLICLVTIO'I SOURCES 2'CQ FUI1IHEB IHPQSKATI03 . 


A. Rocent Books 


Ballard, D.H. , cad Broun* O.H.- Connate? Vision , Eagistfood Cliffs s 
Itonticc Kali, 1532. 

Brady, M. (Dd.) Ccsgafesr Visit. on * /uisterdanj Hcrth Hollaed, 1931. 

Cohan, P.R . , cad Falganteaica, E«&. e r 'VAulcn' f “ Ths Kacdlisoh of Artificial 
Intolllgsr.ca . Vol. Ill, Ids Altsa, CAi Kaufman 8 1532, pp« IZ’j-yZl* 

Earaliek, R, (si.) Plctwga lata Amlyala , Berlin* Soriisgtsr-Verl&g, 
1932 . 

Karr, D.C., Vision , San Prancicsoi Katt, Fresn3n 8 1932. 

Kavatia, R., Richlno "-rcrotion ,, Englewood Cliffos FToniica Hall, 1932. 

Rcaenfoid, A., and Halt, A.G., Digital Jsp j ~a groos^al ng , 2nd Sd. e 
Vole. 1 and 2, Kea Ycr’it Acad. Pr. , 1532. 

rdvlidio, T., A Igor it tea for Graphics and Iragr Processin g, Rockville, Kd 
Conputor Scienoo Frees, 1532. "**" 

Hord, R. H., D igital I taro Procscalng of Ropotoly Sensed Data , Kaw York: 
Acad. Fj . , 1952 





- 



ORlGtNA L FAGS IS 
OF POOR QUALITY 

B • Periodic Ceaforcnsa and k T taAfihcr> foreredfogg, 


BAEFA Isaga Understanding Workshopc - Sci, Applications lac. I 

International Ceafercreon on Eattern Recognition - IES2 Computer Society 

J 

Rational Conferences on Artificial Intelligence ~ MAI ; 

IEE3 Uarknhops on Computer Vision - 2ESI3 Computer Society | 

International Joint Conferences on Artificial Iniolligenea | 

International Conferences ©a Robot Vision - Tfca Industrial Robot Journal ! 

end Sonacr Review ' 

5 

Workshops on Industrial Applications of Cespafcsr Vision - ISIS Csaputcr I 

Society | 

SPIS Technical Syspoaia - Society of Photo-Optical Inotruaont-atica | 

Daginoers s 

i 

IISF Workshops (Aperiodic norkehops on various topics in oesputor vision). j 



156 




I 


ORKSifiAL PASS UJ 
Oh* POOF; QUALITY 


G. rbrlcdicalo 

Conputer Graphies arJ. Isaga SeososGins 
Artificial latolllssRce 

TCra Transactions on Eitiorn Analysis end Kashins IntallLpaaca 


Pattern Rocopsitlen 

Interactional Journal of Robotics IteBoarch 

Tnmaaotioaa cn Systoaa, Kan cad Cyfcsrasties 


i 


{ 




- «£? 






ORIGINAL PAGE fS 
OF POOR QUALITY 

D . Scrip Rccest Blbllo,tra-r.hio 3 , Surveys find Syntbossh 


Abuja, II. and Schrctar, 3., "lease ficileLn/’ AC M Cenmtlng S tigv&va. 

Yol. 13. He. 4, Eoc. l$3i, pp. 373-393. 

Barren, H.G. ari Tcnoabnra, J. II., "CscputaticnaJ. Vision s'* Ires. of tha 
jggg , Vol. 69. Ho. 5, Hay 1501, pp e 572-595. 

Blnford, T.O. , "Survey of Kcdol-Baesd lasgo Analysis Systems," Robot lea 
Bngaareh , Vol. 1, Ho. 1, Spring 1532, pp 18-64. 

Brady, II., "Computational Approaches to Inogo Hndetrotaading? Ccacator 
Surveys , Vol. 14, I!o. i r Larch 1532, pp. 3-?l. 

Chin, 3. T. , "Autosatsd Visual Inspoctien Techniques end Arpllcatie^st A 
Bibliography," Pattern Bosonglfrio a, Vol, 5« Ho. 4., 1532, 
pp. 328-357. . 

(kmsry, D. , CunniBghan, H., Saund, 12., High, J., and Ecuff, C., 

Csnrmtcr Vie ion ,. JEL 01-52, JFL, Iheadera, at, Uov. 1, 1531. 

Kruger, R.P. end Thonjson, U.B., "A Technical atjd Eseno.oic Aaocscsanti of 
Computer Vision for Inspection and Robotic Asscably,” pres, of the 
HXS3 . Vol. 69. Ho. 12, Boc. 1931, pp. 153’i-15>3. 

Roscnfold, A., Picture Proconn in" , U. of Ed. C.S. Center, Colley* 

Rirk, I!d. t A yearly bibliography of computer presosnins of 
pictorial inf creation. 

Srihari, S. II. "Ropraacntatlcn of 3 D Digital Iragoo," ACH Ccagatiog 
Survovo, Vol. 13, Ho. 4 Doc. 1931., pp. 355-424. 






End of Document 



